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Preface 


This book introduces two closely related subjects: linear algebra and group theory 
for undergrads in engineering, physics, chemistry, and (applied) math. This is 
indeed an interdisciplinary point of view: math and its applications go hand in hand. 

The linear algebra part introduces both vectors and matrices, with a lot of 
examples in two and three dimensions: small 2 x 2 Lorentz matrix, and 3 x 3 
rotation matrix. This prepares the reader quite well to the group theory stuff: 2 x 2 
Moebius and Pauli matrices, 3 x 3 projective matrices, and so on. This way, the 
reader gets ready to higher dimensions as well: big Fourier and Markov matrices, 
operators in quantum mechanics, and stiffness and mass matrices in (high-order) 
finite elements. 

It makes sense to place the matrices in a new group. This may help mirror (or 
represent) many other groups as well. This is how linear algebra paves the way to 
group theory. Thanks to the language of matrices, groups become much more 
concrete, and easy to store on the computer. 

Thus, the book offers a unique approach: it introduces both linear and modern 
algebra at the same time. This shows quite clearly how related these topics really 
are, and how they can benefit from each other, and complete each other. Indeed, at 
the end of the book, we work the other way around: group theory paves the way to 
linear algebra to uncover the electronic structure in the atom. 


How to Use the Book in Academic Courses? 


The book could be used as a textbook in undergraduate courses: 


linear algebra for physicists and engineers (Chaps. 1-4 and 7), 
group theory and its geometrical applications (Chaps. 5-6), 

special relativity—algebraic point of view (Chaps. 1, 4, and 15), 
quantum mechanics—algebraic point of view (Chaps. 1, 7, and 14), 
high-order finite elements in 3-D (Chaps. 8-13). 
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Indeed, Chaps. 1-4 introduce linear algebra, with applications in mechanics and 
Statistics. Chapters 5—6, on the other hand, introduce group theory, with applica- 
tions in projective geometry. Furthermore, Chaps. 8—13 introduce high-order finite 
elements to design a regular mesh (and an optimal spline) in a complicated 3-D 
domain. Finally, Chaps. 14-15 assemble the stiffness and mass matrices in 
advanced applications in quantum chemistry and general relativity. 

The book is nearly self-contained: the only prerequisite is elementary calculus, 
which could be attended concurrently with these courses. There are plenty of 
examples and figures to make the material more visual and friendly. All figures are 
referenced in the text. 

Each chapter ends with a lot of relevant exercises with hints or even solutions. 
This may help the reader follow the theory and develop new results on his/her own. 


Roadmaps: How to Read the Book? 


physicist /chemist /engineer ———+ Chapters 1-2: linear algebra 


Y 
Chapter 4: special relativity 


Y 


Chapters 7 and 14: quantum mechanics 


Y 
Chapter 15: general relativity 


Fig. 1 How could a physicist/chemist/engineer read the book? 


Here are a few different ways to read the book. They are illustrated in a few 
roadmaps (Figs. 1-3): 


e Physicists, chemists, and engineers might want to 


— read Chaps. 1-2 about linear algebra, with applications in mechanics. 

— Then, proceed to Chap. 4, where matrices are used to introduce special 
relativity. 

— Finally, conclude with more advanced applications: Chaps. 7 and 14 about 
quantum mechanics, and Chap. 15 about general relativity. 


e Computer scientists, on the other hand, might want to 


— start from Chap. | about linear algebra. 
— Then, proceed to Chap. 3, which uses a Markov matrix to design a search 
engine. 


Preface Vii 


computer scientist § —W\—\ > Chapter 1: linear algebra 


Chapter 3: search engines 


Chapter 5: group theory 


Chapter 6: applications 
in computer graphics 


Fig. 2. How could a computer scientist read the book? 


numerical analyst /engineer ———+ Chapters 1-2: linear algebra 


Y 


Chapters 8-12: finite elements in 3-D 


vy 


Chapter 13: Splines in 3-D 


Y 


Chapters 14-15: advanced 


applications in physics 


Fig. 3. How could a numerical analyst or an engineer read the book? 


— Finally, conclude with Chap. 5 about group theory, and Chap. 6 that uses it 
in computer graphics. 


e Finally, numerical analysts and engineers could also 


— start from Chaps. 1-2 to get introduced to linear algebra. 

— Then, skip to Chaps. 8-13 about finite elements, meshes, and splines. 

— Finally, conclude with Chaps. 14—15 that assemble the stiffness and mass 
matrices in advanced physical systems. 
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Part I 
Introduction to Linear Algebra 


We’re already familiar with elementary algebraic objects: numbers (or scalars), along 
with the arithmetic operations between them. In this part, on the other hand, we look 
at more complicated structures: vector and matrix. Fortunately, it is also possible to 
define arithmetic (or algebraic) operations between them. 

With these new operations, the vectors make a new linear space. The (nonsingular) 
matrices, on the other hand, make yet another important structure: a group. In a 
group, the associative law must hold. The commutative law, on the other hand, not 
necessarily holds. 

The matrix is not just an algebraic object. It may also have a geometrical meaning: 
a mapping or transformation. This is most useful in many applications. 

In special relativity, for example, the Lorentz transformation can be written as a 
small 2 x 2 matrix. In geometrical mechanics, on the other hand, 3 x 3 matrices 
are more useful. Finally, a yet bigger matrix is often used in stochastic analysis, 
to model a Markov chain in a graph. This has an interesting application in modern 
search engines in the internet. 


Chapter 1 ®) 
Vectors and Matrices ets 


Here is what we are going to do in this chapter. What is a vector? It is a finite list of 
(real) numbers: scalars, or components. 

In a geometrical context, the components have yet another name: coordinates. In 
the two-dimensional Cartesian plane, for example, a vector contains two coordinates: 
the x- and y-coordinates. This is why the vector is often denoted by the pair (x, y). 
Geometrically, this can also be viewed as an arrow, leading from the origin (0, 0) to 
the point (x, y) € R?. Here, R is the real axis, R? is the Cartesian plane, and “e” 
means “belongs to”. 

In the three-dimensional Cartesian space, on the other hand, the vector also con- 
tains a third coordinate: the z-coordinate. This is why the vector is often denoted by 
the triplet (x, y, z) € R?. 

Still, vectors are more than just lists of numbers. They also have linear arithmetic 
operations: addition, multiplication by a scalar, and more. With these operations, the 
vectors form a complete linear space. 

What is a matrix? It is a rectangular frame, full of numbers: scalars or elements, 
ordered row by row in the matrix. 

Unlike the vector, the matrix has a new arithmetic operation: multiplication. A 
matrix could multiply a vector, or be applied to a vector. This is done from the left: 
first write the matrix, then the vector. Likewise, a matrix could multiply another 
matrix. 

In geometrical terms, the matrix can also be viewed as a linear mapping (or 
transformation) from one vector space to another. To map a vector, the matrix should 
be applied to it. This produces the new image (or target) vector. With this new 
interpretation, the matrix is now more active: it acts upon a complete vector space. 
For example, the matrix could simply rotate the original vector (see exercises at the 
end of Chap. 2). 
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4 1 Vectors and Matrices 


1.1. Vectors in Two and Three Dimensions 


1.1.1 Two-Dimensional Vectors 


What is a vector? It is a finite list or sequence: a finite set of numbers, ordered one 
by one in a row. In this list, each number is also called a scalar or a component. The 
total number of components is often denoted by the natural number n. 

In geometrical terms, on the other hand, the components are also viewed as coor- 
dinates. This way, the vector also takes a new interpretation: an n-dimensional vector, 
in a new n-dimensional linear space. 

In the trivial case of n = 1, for example, the vector contains just one component 
or coordinate. In this degenerate case, the one-dimensional vector (x) mirrors the 
scalar x: both can be interpreted geometrically as the point x on the real axis. 

In the more interesting case of n = 2, on the other hand, the two-dimensional 
vector is a pair of two numbers: x and y. Here, the first component x serves as 
the horizontal coordinate, whereas the second component y serves as the vertical 
coordinate. 

This way, the original vector (x, y) takes its geometrical meaning as well: the new 
point (x, y) in the Cartesian plane. To illustrate the vector, draw an arrow from the 
origin (0, 0) to the point (x, y) € R? (Fig. 1.1). 


1.1.2 Adding Vectors 


Consider two vectors: (x, y) and (x, ¥) that lie somewhere in the Cartesian plane. How 
to add them to each other? For this purpose, use the parallelogram rule (Fig. 1.2). After 


Fig. 1.1 The vector (x, y) is 
drawn as an arrow, issuing 
from the origin (0, 0), and 
leading to the point (x, y) in 
the Cartesian plane uy (x,y) 


at 


Fig. 1.2 How to add (x, y) 
to (x, »)? Use them as sides 
in a new parallelogram, and 
let their sum be the diagonal 
of this parallelogram. This is 
the parallelogram rule 
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all, we already have three points: (0, 0), (x, y), and (x, y). To make a parallelogram, 
we need just one more point. This new point will indeed be the required sum (x, y) + 
(&, 3). 

Unfortunately, this is still too geometrical. After all, we can never trust our own 
human eye or hand to draw this accurately. Instead, we better have a more algebraic 
method, independent of geometry. 

Fortunately, the above geometrical rule also has an algebraic face: add component 
by component: 


(x,y) + & 9) = &+5,y +9). 


This way, in the required sum, each individual coordinate is easy to calculate: it is 
just the sum of the corresponding coordinates in the original vectors. This algebraic 
formulation is much more practical: it is easy to implement on the computer, and to 
extend to higher dimensions as well. 


1.1.3 Scalar Times Vector 


A vector can also be multiplied by a number (or scalar, or factor), either from the left 
(scalar times vector) or from the right (vector times scalar). What is the result? It is 
a new vector that could be either shorter or longer, but must still point in the same 
direction as before. After all, the ratio between the coordinates remains the same. 
The only thing that has changed is the length (or magnitude) (Fig. 1.3). 

Unfortunately, this is still too geometrical: it gives no practical algorithm. After 
all, we can never trust our human eye or hand to draw the new vector accurately. How 
to do this algebraically? Easy: just multiply coordinate by coordinate. For example, 
to multiply the vector (x, y) by the scalar a from the left, define 


a(x, y) = (ax, ay). 


Likewise, to multiply from the right, define 


(x, ya = (xa, ya), 


which is just the same as before. In this sense, the multiplication is indeed commu- 
tative. 


Fig. 1.3. How to multiply 
(or stretch) the original 
vector (x, y) by factor 2? 
Well, multiply coordinate by 
coordinate, to produce the 
new vector 

2(x, y) = (2x, 2y), which is 
twice as long 
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Later on, we’ ll see a lot of examples with two-dimensional vectors. In Sect. 1.3.1, 
we'll see that a complex number is actually a two-dimensional vector. Furthermore, 
in the exercises at the end of Chap. 2, we’ll see how to rotate a vector. Moreover, in 
Chaps. 4 and 5, we’ll see Lorentz and Moebius transformations. Here, however, we 
have no time for examples. To see how algebra and geometry go hand in hand, we 
better go ahead and extend the above to three spatial dimensions as well. 


1.1.4 Three-Dimensional Vectors 


So far, we’ve considered the two-dimensional case n = 2. In the three-dimensional 
case n = 3, on the other hand, we introduce one more dimension: the z dimen- 
sion. This way, a vector is now a triplet of three (rather than two) components or 
coordinates: (x, y, Z). 

In geometrical terms, the vector (x, y, z) represents a point in the three-dimensional 
Cartesian space, with the horizontal coordinates x and y, and the height coordinate 
z. This is why the vector is often illustrated as an arrow, issuing from the origin 
(0, 0, 0), and leading to the point (x, y, z) € R? (Fig. 1.4). 

As in Sect. 1.1.2, addition is still made coordinate by coordinate: 


(x,y,z) +H, 9,2) = +X, y+5,74+2). 


Furthermore, as in Sect. 1.1.3, multiplication by a scalar is still made coordinate by 
coordinate as well. This could be done either from the left: 


a(x, y, Z) = (ax, ay, az), 


or from the right: 
(x, y, Za = (xa, ya, za). 


In both cases, the result is the same. In this sense, the commutative law indeed applies. 

Later on, we’ll see a lot of examples. In Chap. 2, in particular, we’ll rotate a three- 
dimensional vector. Here, however, we have no time for this. After all, the complete 
algebraic picture is far more general. To realize this, we better proceed to a yet higher 
dimension, with no apparent geometrical meaning anymore. 


Fig. 1.4 The 
three-dimensional vector 

(x, y, Z) is an arrow, issuing 
from the origin (0, 0, 0), and 
leading to the point (x, y, z) 
in the Cartesian space 
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1.2 Vectors in Higher Dimensions 


1.2.1 Multidimensional Vectors 


So far, our vectors had a concrete geometrical meaning. For n > 3, on the other hand, 
they are no longer geometrical, but only algebraic. 

The n-dimensional vector is first of all a set: a finite list or sequence of n individual 
numbers (components): 


V = (Vj, U2, V3,..., Un) € R’, 
where R is the real axis and R” is the n-dimensional space. Still, the vector is more 
than that: it is also an algebraic object, with all sorts of arithmetic operations. To see 
this, consider yet another vector: 
Uu = (Uy, Uo, ..., Un) € R". 


This vector can now be added to v, component by component: 


Uutv= (uj +4, U2 + 2,...,Un + Un). 


1.2.2. Associative Law 


Fortunately, this operation is associative. Indeed, let w = (w 1, w2,..., Wy) be yet 
another vector in R”. Now, to sum these three vectors, ordering doesn’t matter: either 
add the first two to each other and then add the third one as well, or start from the 
second and third vectors and add them to each other, and then add the first as well: 


u+(v+w) = (uy t+ (vy + Ww), U2 + (v2 + W2), . 22, Un + (Un + Wn)) 
= (uy + v1) + wy, (U2 + V2) + We, .-., Ug + Un) + Wp) 
=(u+tv)+w. 


This is indeed the associative law for addition. 


Still, the associative law applies not only to addition but also to multiplication by 
a scalar. 


1.2.3 The Origin 


In R", there is one special vector: the zero vector (or the origin), with n zeroes: 


0=(0,0,0,...,0) (zeroes). 
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In what way is this vector special? Well, it is the only vector that can be added to 
any n-dimensional vector v, with no effect whatsoever: 


O+v=v4+0=v. 


1.2.4 Multiplication and Its Laws 


Now, how to multiply a vector by a scalar a € R, either from the left or from the 
right? Well, as before, this is done component by component: 


av = va = (av, AV2,..., AU). 


In this sense, multiplication is indeed commutative. Fortunately, it is associative as 
well. To see this, consider yet another scalar b: 


b(av) = D(avj, dv2,..., AVn) = (bar, bav2,..., bavy) = (ba)v. 


1.2.5 Distributive Laws 


Furthermore, the above arithmetic operations are also distributive in two senses. On 
one hand, you can push the same vector v into parentheses: 


(a+ b)v = (a+b)u, a+ b)v2,...,(@+ bd) rn) 
= (av, + bv, av2 + bun, ..., AVn + Duy) 
av + bv. 


On the other hand, you can push the same scalar a into parentheses: 


atu + v) = (au, + 01), (2 + 02), -- +, @Un + Un)) 
= (au, + avy, dua + av’2,..., Un + avy) 


=au+av. 


This completes the definition of the new vector space R”, and the linear algebraic 
operations in it. 
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1.3. Complex Numbers and Vectors 


1.3.1 Complex Numbers 


The two-dimensional vectors defined above may also help to model complex num- 
bers. As a matter of fact, complex numbers have just one more algebraic operation: 
multiplication. Let’s explain this briefly. 

The negative number — 1 has no square root: there is no real number whose square 
is —1. Fortunately, it is still possible to introduce a new auxiliary (not real) number— 
the imaginary number i. This way, i is the only number whose square is —1: 


or 


Because it lies outside the real axis, i may now span a new vertical axis, perpendicular 
to the original real axis (Fig. 1.5). 
For this purpose, place i at (0, 1), above the origin. This way, i spans the entire 
imaginary axis: 
{bi= (0,b) | -w~ <b< oo}. 


Here, b is some real number. In the above, the algebraic multiple bi also takes the 

geometrical form (0, b): a new point on the vertical imaginary axis. In particular, if 

b = 1, then we obtain the original imaginary number i = (0, 1) once again. 
Together, the real and imaginary axes span the entire complex plane: 


C={a+bi= (a,b) | —co < a,b < oo}. 


Here, a and b are some real numbers. The new complex number a + bi also takes its 
geometrical place (a, b) € C (Fig. 1.6). 

So far, the complex plane is defined geometrically only. What is its algebraic 
meaning? Well, since complex numbers are two-dimensional vectors, we already 
know how to add them to each other and multiply them by a real scalar. How to 


—1 0 the real axis 


Fig. 1.5 The imaginary number 7. The arrow leading from the origin to i makes a right angle with 
the real axis. In i7 = —1, on the other hand, this angle doubles, to make a flat angle with the positive 
part of the real axis 
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Fig. 1.6 The complex plane the imaginary axis 
C. The imaginary number 
i= J/—lisat (0,1).A 


complex number a + bi is at (a,b) @ + b 


“ NN 


(0, 0)) (origin) 


the real axis 


multiply them by each other? Well, we already know that 
?=-1. 


After all, this is how i has been defined in the first plane. 

What is the geometrical meaning of this algebraic equation? Well, look at the 
negative number — 1. What angle does it make with the positive part of the real axis? 
A flat angle of 180°. The imaginary number i, on the other hand, makes a right angle 
of 90° with the real axis (Fig. 1.5). Thus, multiplying i by itself means adding yet 
another right angle, to make a flat angle together. 

Let’s go ahead and extend this linearly to the entire complex plane. In other words, 
let’s use the distributive and commutative laws to multiply a+ bi times c+ di (where 
a, b, c, and d are some real numbers): 


(a + bi)(c + di) = a(c + di) + bi(e + di) = ac + adi + bei + bdi* = ac — bd + (ad + be)i. 
For example, look at the special case 
c=a and d=-b. 
This way, c + di is the complex conjugate of a + bi, denoted by a small bar on top: 
c+diza—bi=(at+ bi). 
In this case, the above product is a new real number: the squared absolute value of 
a+ bi: 
la + bil? = (a+ bi)(a — bi) = a’ +0". 

Thanks to this definition, the absolute value is the same as the length of the vector 


(a, b), obtained from Pythagoras theorem. If this is nonzero, then we could divide 
by it 
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a—bi 
bi) ———~ = 1. 
(a+ D24e 


So, we now also have the reciprocal (or inverse) of a + bi: 


Gin S=—— 
a + b 


1.3.2. Complex Vectors 


In our n-dimensional vector, each individual component can now be a complex num- 
ber. This yields a new vector space: C”. The only difference is that the components 
(and the scalar that may multiply them) can now be not only real but also complex. 

Because the arithmetic operations are defined in the same way, the same associa- 
tive and distributive laws still hold. In other words, the algebraic operations remain 
linear. For this reason, C” could be viewed as a natural extension of the original 
vector space R”. 


1.4 Rectangular Matrix 


1.4.1 Matrices 


A matrix is a frame, full of numbers. In an m by n (or m x n) rectangular matrix, 
there are mn numbers: m rows, and n columns (Fig. 1.7). 
The original m by n matrix is often denoted by 
A= Ci een l<j<n~* 
Here, the individual element a;; is some number, placed in the ith row, and in the 


jth column. This way, A contains n columns, ordered one by one, left to right. Each 
column contains m individual elements (numbers), ordered top to bottom. 


Fig. 1.7 A rectangular 1 < j < on 
m X n matrix: there are m 1 
rows, and n columns 

a aij 
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Let’s look at each column as an individual object, which could serve as a new com- 
ponent in a new row vector. From this point of view, A is actually an n-dimensional 
vector, with n “components” in a row: each “component” is not just a scalar, but 
a complete vector in its own right: an m-dimensional column vector, containing m 
numbers: 


A= (vo?) | yp? | yp? ft saa v”) 
where v), v®,..., v™ are column vectors in R”. This way, for 1 <j <n, v is 
the jth column in A: 
(/) 
| (28 
Vy a2; 
p= vo =] 3, 
vy? am,j 


In this column vector, for 1 < i < m, the ith component is the matrix element 


eas 
Uv; = aij. 


For example, if m = 3 and n = 4, then A is a3 x 4 matrix: 


41,1 41,2 41,3 41,4 
A=] 421 d2,2 473 A2,4 
43,1 43,2 43,3 43,4 


In this form, A could also be viewed as a list of three rows. Each row contains four 
numbers, ordered left to right. In total, A contains 12 elements. 


1.4.2 Adding Matrices 


Let 


B= (bi) cic wIsjsn 


be yet another m x n matrix. This way, B can now be added to A, element by element: 


Alternatively, B could also be written in terms of its columns: 


B= (u” | yu | u® [ieee u”) 
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where u\, u®,..., u are column vectors in R”. This way, B could also be added 
column by column: 


A+BS (VP uM |v 4u% |v +u% |---| vo +u™). 
It is easy to see that this operation is associative: 
(A+B)+C=A+(8+0O), 


where C is yet another m x n matrix. 


1.4.3. Scalar-Times-Matrix 


To multiply A by a real number r € R (either from the left or from the right), just 
multiply element by element: 


rA = Ar= a) hee pen 
Clearly, this operation is associative as well: if g € R is yet another scalar, then 


q(rA) =4 (a5) jeceq l<j<n = (974i), <:<m, l<j<n = (qrJA. 


Furthermore, the above operations are also distributive in two senses. On one hand, 
you can push the same matrix A into parentheses: 


(r+qgA=rA+qA. 
On the other hand, you can push the same scalar r into parentheses: 


r(A+B)=rA+rB. 


1.4.4 Matrix-Times Vector 
Recall that A has already been written column by column: 
A= (yu | y® ieee v”) ; 


This way, each column is in R”. Consider now a different column vector, not in R” 
but rather in R”: 
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Wi 
wW2 


Wh 


How many components are here? As many as the number of columns in A. This is 
just enough! We can now scan A column by column, multiply each column by the 
corresponding component from w, and sum up: 


n 
Aw = wy + wv +--+ wav = De wy 
j=l 


This is indeed A times w: a new linear combination of the columns of A, with 
coefficients taken from w. 
Thus, w and Aw may have different dimensions: w is n-dimensional, whereas Aw 
is m-dimensional. Later on, we’ll use this property to interpret A geometrically, as a 
mapping: 
A:R"’ > R”. 


This means that A maps each n-dimensional vector to an m-dimensional vector. 
In other words, A is a function: it takes an n-dimensional vector, and returns an 
m-dimensional vector. Here, “—” stands for mapping, not for a limit. 

In Aw, what is the ith component? Well, it is just 


To calculate this, scan the ith row in A element by element. Multiply each element 
by the corresponding component from w, and sum up. 

This is the new matrix-times-vector operation. What are its algebraic properties? 
Well, first of all, it is associative for scalars: for any scalar r € R, 


A(rw) = r(Aw) = (rA)w. 


Furthermore, it is also distributive in two senses. On one hand, you can push the 
same vector into parentheses: 


(A+ Bw =Aw-+ Bu, 
where B has the same dimensions as A. On the other hand, you can push the same 
matrix into parentheses: 


A(w +u) = Aw +Au, 


where u has the same dimension as w. 
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In summary, w — Aw is indeed a linear transformation. This will be useful later. 
In Chap. 2, for example, we’ll see how a matrix could rotate a vector. Here, however, 
we have no time for examples. To have the complete algebraic picture, we better 
extend the above to a yet more complicated operation: matrix-times- matrix. 


1.4.5  Matrix-Times-Matrix 


The above can now be extended to define yet another kind of multiplication: matrix- 
times-matrix. For this purpose, however, we must be careful to pick a matrix B with 
proper dimensions: an / x m matrix, where / is some natural number as well. 

Why must B have these dimensions? Because, this way, the number of columns 
in B is the same as the number of rows in A. This is a must: it will help multiply B 
times A soon. 

Fortunately, A has already been written column by column. Now, apply B to each 
individual column: 


BA = (Bu | Bv® | --- | Bu). 


Why is this legitimate? Because A has m rows, and B has m columns. This way, the 
product BA is anew / x n matrix: it has as many rows as in B, and as many columns 
as in A. 

In BA, whatis the (i, k)th element? Well, for this purpose, focus on the kth column: 
Bu, In it, look at the ith component. It comes from the ith row in B: scan it element 
by element, multiply each element by the corresponding component in v, and sum 
up: 

m m 


(BA)ix = (BU), = Dd. = Do dijaje, 1 Sisl ls<k<n. 
j=l 


Thus, at the same time, two things are scanned element by element: the ith row in B, 
and the kth column in A. This makes a loop of m steps. In each step, pick an element 
from the ith row in B, pick the corresponding element from the kth column in A, 
multiply, and sum up: 


m 


(BA)i x = Yo di jai. 


j=l 


1.4.6 Distributive and Associative Laws 


Fortunately, the new matrix-times-matrix operation is distributive in two senses. On 
one hand, you can push the same matrix into parentheses from the right: 
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(B+B)A=BA+BA 


(where B has the same dimensions as B). On the other hand, you can push the same 
matrix into parentheses from the left: 


B(A+A) =Ba+BA 


(where A has the same dimensions as A). 
Moreover, matrix-times-matrix is also an associative operation. To see this, let 


C = (ci) Is<i<k, l<j<l 


be ak x / matrix, where k is some natural number as well. 

Why are the dimensions of C picked in this way? Well, this guarantees that the 
number of columns in C is the same as the number of rows in B (and also in BA). 
This way, we can now apply C from the left, and produce the new matrices C(BA) 
and (CB)A. 

Are they the same? To check on this, let’s calculate the (s, t)th element, for some 
1<s<kandl1 <t<n: 


l 


(C(BA))s,1 = oes, (BA)i, 


i=1 


1 m 
= 2 Cs, p> Dj jj, 
i=l j=l 
1 m 
= > > Cs,iDijQj,t 


i=1 j=l 


m 1 
=) y Cs,iDj jQj,t 


j=l i=l 


m 1 
= ps ci) Git 


j=l \iel 


= \(CB), ja). 
j=l 
= ((CB)A)s,r. 


This can be done for every pair (s, t). Therefore, 


C(BA) = (CB)A, 
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as asserted. This proves that matrix multiplication is not only distributive but also 
associative. In summary, it is a linear operation. This will be useful later. 


1.4.7 The Transpose Matrix 


Consider again our matrix A. Look at it the other way around: view the rows as 
columns, and the columns as rows. This yields a new n x m matrix— the transpose 
matrix A’: 


For example, if A is the 3 x 4 matrix 


4,1 41,2 41,3 41,4 
A= |] 42) Q22 423 2,4 |, 


43,1 43,2 43,3 3,4 
then A’ is the 4 x 3 matrix 


41,1 42,1 43,1 
At = 41,2 42,2 43,2 
41,3 42,3 43,3 
1,4 42,4 43,4 


From the above definition, it clearly follows that 
(A') =A. 


In Sect. 1.4.5, we’ve defined the product BA, where B is an/ x m matrix. Why is this 
product well defined? Because the number of rows in A is the same as the number 
of columns in B. 

How to multiply the transpose matrices? Well, the number of rows in B’ is the 
same as the number of columns in A’. So, we can construct a new n x / matrix: the 
product A‘B’. 

Is it really new? Not quite. After all, we’ve already seen it before, at least in its 
transpose form: 

(BA)' = A'B'. 


To prove this, pick some 1 < i < / and 1 < k <n. Consider the (k, i)th element in 
(BA): 


m m 


(BA), ; = (BA)ik = >) Bisa = >> AL Bis = (A'B), ; > 
j=l j=l 


as asserted. 
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1.5 Square Matrix 


1.5.1 Symmetric Square Matrix 


So far, we’ve assumed that A was rectangular: the number of rows was not necessarily 
the same as the number of columns. In this section, on the other hand, we focus on 
a square matrix of order m = n. Since A is square, it has a main diagonal—from the 
upper left corner to the lower right corner: 


G11, 42,2, 43,3, ---, Ann 
(Fig. 1.8). This diagonal splits A into two triangular parts: its upper right part, and its 
lower left part. If they mirror each other, then we say that A is symmetric. 


This means that one could place a mirror on the main diagonal: the (i, 7)th element 
is the same as the (j, i)th element: 


aij=Q4i, LSij<n. 
In other words, A remains unchanged under interchanging the roles of rows and 


columns: 
A=A'. 


1.5.2 The Identity Matrix 


Here is an example of a symmetric matrix: the identity matrix of order n, denoted by 
I. On the main diagonal, it is 1: 


ii = 1, 1l<i<n. 


Off the main diagonal, on the other hand, it is 0: 


Fig. 1.8 In a square matrix 1<i < j<n 
A of order n, the main 1 

diagonal contains a1,1, a2,2, 

. ++, Ann. If A is symmetric, 4 Qi,j 


then the lower triangular part 
mirrors the upper triangular 
part: aj,i = aij J a5,i 
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In summary, 


Off the main diagonal, all elements vanish. No need to write all these zeroes. A blank 
space stands for a zero element. 
Why is the identity matrix so special? Well, apply it to just any n-dimensional 
vector v, and you’d see no effect whatsoever: 
Iv=v. 
Likewise, apply J to just any square matrix A of order n, and you’d see no effect 
whatsoever: 


IA=AI=A. 


This is why J is also called the unit matrix. 


1.5.3 The Inverse Matrix as a Mapping 


The square matrix A may also have an inverse matrix: a new matrix A~', satisfying 
AAS, 
In this case, we say that A is nonsingular or invertible. 

Thus, in the world of matrices, a nonsingular matrix plays the role of a nonzero 
number, and its inverse plays the role of the reciprocal. Later on, we’ll see this in a 
yet wider context: group theory. 

Recall that A could also be viewed as a mapping of vectors: 

v — Av. 
How to map Av back to v? Easy: just apply A~! to it: 


Av > A! (Av) = (A7'A) v= Iv =v. 


This way, thanks to associativity, A~! maps the vector Av back to v. 
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So, both A and A™! could actually be viewed as mappings. This could be quite 
useful. Indeed, let’s look at them the other way around: A~! is now the original 
mapping that maps 

vo A! v, 
and A is its mirror: the inverse mapping that maps A~!v back to v: 
A'vy > A(A“'v) =v. 

In the language of matrices, this could be written simply as 


AA! =I. 


In summary, although matrices not necessarily commute, A and A~' do commute 
with each other. 


1.5.4 Inverse and Transpose 


The inverse matrix could be quite hard to calculate. Still, once calculated, it is useful 
for many purposes. 

Whatis the inverse of A‘? No need to calculate it! After all, A~! is already available. 
To have the answer, just take its transpose. Indeed, from Sect. 1.4.7, 


(A7!)'A’ = (447!) 
In other words, 


as asserted. In summary, the inverse of the transpose is just the transpose of the 
inverse. Thus, no parentheses are needed: one could simply write 


Finally, let A and B be two square matrices of order n. What is the inverse of the 
product? It is just the product of the inverses, in the reverse order: 


(AB) | =B A". 
Indeed, thanks to associativity, 


(AB) (B-'A~') = A(B(B™'A~')) = A ((BB™') A7') = A (IA™') = AAT! = 1. 
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1.6 The Hermitian Adjoint 


1.6.1 Complex Matrix and Its Hermitian Adjoint 


Let’s go back to a rectangular matrix, which is not necessarily square. So far, we 
have considered a real matrix, with real elements in R. Let’s extend the discussion 
to an m x n complex matrix, with complex elements in C. 

Fortunately, the same properties still hold, including the distributive and associa- 
tive laws. The transpose, however, must be replaced by a more general notion: the 
Hermitian adjoint. 

Recall that the complex number 


5 
lil 


a+b/-1 


has the complex conjugate 


c=a-—bvV/-1 


(Sect. 1.3.1). Fora given complex matrix A, its Hermitian adjoint is anew matrix—the 
conjugate transpose matrix. It can be obtained in two stages: 


e take its transpose, 
e and then take the complex conjugate of each individual element. 


This makes a new n x m matrix, denoted by A! (or A*): 


Al, = jj, l<i<m,1<j<n. 


In particular, if A happens to be a real matrix, then the complex conjugate has no 
effect whatsoever. In this case, the Hermitian adjoint is just the same as the transpose: 


Al = At 
This way, the Hermitian adjoint is indeed a natural extension of the notion of the 
transpose to the wider set of complex matrices. 
As in Sect. 1.4.7, it is easy to see that 


(ay _— A, 


and that 
(BA)! =_ A'B", 


provided that the number of columns in B is the same as the number of rows in A. 


22 1 Vectors and Matrices 


Fig. 1.9 In a Hermitian 1<i < j<n 
matrix, the lower triangular 1 
part is the complex conjugate 
of the upper triangular part i Qi,j 
J} Gi, 
m=n 


1.6.2 Hermitian (Self-Adjoint) Matrix 


Consider now a complex square matrix A, of order m = n. We say that A is Hermitian 
(or self-adjoint) if it is the same as its Hermitian adjoint: 


A=A". 
In this case, the (i, 7)th element is the complex conjugate of the (/, /)th element: 
aij = Al, = Aji, 1 < i,j <n. 


In this case, the lower triangular part is the complex conjugate of the upper triangular 
part (Fig. 1.9). This way, the main-diagonal elements must be real: 


Qji=aji, L<is<n. 


1.7. Inner Product and Norm 


1.7.1 Inner Product 


Let 
uy vi 
u2 v2 
“= : and v= 
Un Un 


be two column vectors in C”. They can also be viewed as narrow n x 1 “matrices”, 
with just one column. This way, they also have their own Hermitian adjoint: 


h - - - h — = - 
u = (1, U2, ..., Un) and U = (U1, V2,..., Un), 


which are | x n “matrices”, or n-dimensional row vectors. 
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Thus, wv’ has n “columns”, and v has n “rows”. Fortunately, this is the same number. 
So, we can now go ahead and multiply wu" times v, as in Sect. 1.4.5. The result is a 
new | x | “matrix”, or just a new (complex) scalar: 


n 


ee - 
(u,v) =u v= >o iy. 


j=l 


This is called the scalar product, or the inner product of u and v. 
And what would happen if we dropped the bar, and didn’t use the complex con- 
jugate at all? This would produce the so-called “real” inner product: 


n 
t = . . 
uv= ) Uj Vj. 
j=l 


In general, this is not necessarily a real number: it could be complex as well. Still, if 
u and v are real vectors in R”, then this number must be real: the same as the original 
inner product. This is why it is called “real” inner product. 

Let’s return now to the original inner product defined above. Let c € C be some 
complex number. Then, we could “pull” c out of parentheses, possibly with a bar on 
top: 

(cu, v) = C(u, v), 


and 
(u, cv) = c(u, v). 


Finally, the inner product is a skew-symmetric operation: interchanging u and v 
introduces a bar on top. Indeed, 


n 


(vu) = do dy; 


j=l 


is the complex conjugate of (u, v). 


1.7.2. Norm 


What is the inner product of v with itself? Well, this is a real nonnegative number: 


n n 
y i » 2 
(v, v) = Uji = |v,| > 0. 
j=l j=l 


Could it be zero? Only if v was the zero vector: 
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(v,v)=0 S v=0. 


Thus, (v, v) has a square root. Let’s use it to define the norm (or length, or magnitude) 


of v: 
|v] = V(, v). 
This way, 
|u| = 0, 
and 


vl] =O > v=0, 


as expected from magnitude. Furthermore, every complex number c € C could be 
“pulled” out of the || - || sign: 


llcv|| = V/(cv, ev) = Vee(v, v) = Ve, v) = lel, v) = Ie] - [lv]. 


This way, if v is anonzero vector, then it could be normalized. Indeed, since ||v|| > 0, 
one could pick c = 1/||v||, to produce the normalized unit vector v/||v||: the unique 
vector of norm | that is proportional to v. 

The norm ||v|| defined above is also called the /7-norm, and is also denoted by 
||v||2, to distinguish it from other norms: the /,-norm 


n 
lol = >- lui, 
i=1 


and the /,,- norm (or the maximum norm) 


I|Vlloo = max |vjI. 
l<i<n 


Like the /3-norm defined above, these norms also make sense: they vanish if and only 
if v is the zero vector. Furthermore, every complex number c € C could be pulled 
out: 

leulli = lel llulli, and |lev|loo = lel: [lUlloo- 


In what follows, however, we mostly use the /2-norm. This is why we denote it simply 
by ||v|| for short, rather than ||v||2. 


1.7.3, Inner Product and the Hermitian Adjoint 


Assume again that A is an m x n rectangular complex matrix. Let u and v be complex 
vectors of different dimensions: u is m-dimensional, and v is n-dimensional. This 
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way, Av is m-dimensional as well (Sect. 1.4.4). We can now take the inner product 
of two m-dimensional vectors: 


(u, Av) = u''Av. 


This is a well-defined complex scalar (Sect. 1.7.1). 

Furthermore, A” is an n x m rectangular complex matrix. This way, A’ can ba 
applied to u: the result A’w is n-dimensional. For this reason, the inner product 
(Au, v) is a well-defined complex scalar. 

Recall that both u and v could also be viewed as narrow matrices. Thanks to 
associativity (Sect. 1.4.6), we now have 


(u, Av) = u"(Av) = (u"A)v = (A’u)"'v = (A"u, Vv). 


Let’s use this result in a special case: a square matrix. 


1.7.4 Inner Product and Hermitian Matrix 


Let’s look at an interesting special case: m = n, so A is now a square matrix. Assume 
that A is also Hermitian: 
A = At 


(Sect. 1.6.2). In this case, the above formula reduces to 
(u, Av) = (Au, v), 


for every two n-dimensional complex vectors u and v. 

Let us now go the other way around. Assume that we didn’t know yet that A 
was Herminian. Instead we only knew that A was a square complex matrix of order 
m = n, Satisfying 

(u, Av) = (Au, v) 


for every two n-dimensional complex vectors u and v. Could we then conclude that 
A was Hermitian? 

Fortunately, yes. For this purpose, let’s use the above formula in a special case. 
For each pair of natural numbers | < i,j <n, let u and v be standard unit vectors, 
with one nonzero component only: 


fees 
m= {o ik ei GCSES” 


_ fi ifk=j 
le (l<k <n). 
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With this choice, 
aij = (u, Av) = (Au, v) = Gj, 


implying that A is indeed Hermitian (Fig. 1.9). 


1.8 Orthogonal and Unitary Matrix 
1.8.1 Inner Product of Column Vectors 


Assume again that A is an m x n rectangular complex matrix. For 1 < i < n, let v 
denote the ith column in A: an m-dimensional column vector. Let us show that, for 
1 <i,j <n, the inner product of the ith column with the jth column is the same as 
the (i, j)th element in A"A. Indeed, 


m 


(A"A);; = zi (A"), 4K.) 


k= 


m 
= > AK i Ak j 
k= 
m 
— Sr gO” 
= ey 
k= 


= (v, v). 


This formula will be useful below. 


1.8.2. Orthogonal and Orthonormal Column Vectors 


Consider two complex vectors u and v, of the same dimension. We say that they are 
orthogonal to each other if their inner product vanishes: 


(u, v) = 0. 


Furthermore, we also say that u and v are orthonormal if they are not only orthogonal 
to each other but also have norm 1: 


(u,v) =0, and |jul] = lvl] = 1. 
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Consider again our m x n rectangular complex matrix, written column by column: 
A= (a | yp [itae'| vu”), 

for some m > n. Assume that its columns are orthonormal: 
@? 0 \=0, 1eigsn, iA), 


and 
Jv] =1, Ls<i<n. 


In this case, what is the (i, 7)th element in A"A? Well, it is either zero or one: 


; 1 ifi=j 
hay. — (yO yO” J 
(A"A)i; = (v ,v ot fie; 


(Sect. 1.8.1), In other words, AA is just the identity matrix of order n: 
ATA =I. 
As a result, A preserves the inner product of any two n-dimensional vectors u and v: 
(Au, Av) = (A"Au, v = (Iu, v)) = (u, v) 
(Sect. 1.7.3). In particular, by picking u = v, A preserves norm as well: 
|Av|? = (Av, Av) = (v, v) = [loll?. 


Let us now go the other way around. Assume that we didn’t know yet that A had 
orthogonal columns. Instead, we only knew that A”A was the identity matrix: 


A"A =I. 


Could we then conclude that A must also have orthonormal columns? Fortunately, 
yes. Indeed, from Sect. 1.8.1, 


1 ifi=j 


@ yA) = (A"A),> =L;= 
(v »vU este a 


as asserted. In summary, A has orthonormal columns if and only if 
ATA =1 


is the identity matrix of order n < m. 
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1.8.3 Projection Matrix 


So far, we’ve studied the product AA. Now, let’s multiply the other way around: AA’. 
This could be a completely different matrix: after all, the commutative law doesn’t 
apply to matrices. Still, is AA” special in any way? 

Well, with the above assumptions (m > n and A has orthonormal columns), it 
indeed is. To see how, multiply the above equation by A from the left: 


A(A"A) = AI =A. 

Next, multiply this new equation by A’ from the right: 
(A (A"A)) A” = AA". 

Thanks to the associative law (Sect. 1.4.6), this equation can also be written as 
(4A") (Aa") = Aa’. 


Thus, AA’ is a projection matrix: if you square it, it still remains the same. 


1.8.4 Unitary and Orthogonal Matrix 


Assume now that A is a square matrix of order m = n. If A also has orthonormal 
columns, then A is called a unitary matrix. In this case, the above formulas tell us 
that 

A'A = AA" =1 
is the identity matrix of order n. This could also be written in terms of the inverse 
matrix: 


Ab= A! and A=(A‘). 


Thanks to Sect. 1.5.4, the inverse of the transpose is just the transpose of the inverse. 
Therefore, one could drop these parentheses, and simply write 


A=A". 
If A is also real, then it is also called an orthogonal matrix. In this case, 
A‘'A=AA' =I, 


sO 
A'=A™! and A=(A‘)7! =A™. 
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1.9 Eigenvalues and Eigenvectors 


1.9.1 Eigenvectors and Eigenvalues 


Let A be a square complex matrix of order n. An eigenvector of A is a nonzero vector 
v € C" satisfying 
Av = Xv, 


for some scalar \ € C. In other words, applying A to v has the same effect as 
multiplying v by A. The number 4 is then called the eigenvalue of A associated with 
the eigenvector v. 

Note that, for every nonzero number c € C, cv is a legitimate eigenvector as well: 


A(cv) = cAv = cAv = X(cv) 


(Sect. 1.4.4). Thus, the eigenvector associated with \ is not defined uniquely, but 
only up to a (nonzero) scalar multiple. 

What is the best way to pick c? Well, since v # 9, ||v|| > 0 (Sect. 1.7.2). Thus, best 
pick c = 1/|lu||. This would “normalize” v, and produce a “new” unit eigenvector: 


“(ia) =i) Lia 


This is the unique unit eigenvector proportional to v. 


ee 


1.9.2 Singular Matrix and Its Null Space 


What are the algebraic properties of the eigenvector v? First of all, it is nonzero: 
v £0, 
where 0 is the n-dimensional zero vector. Furthermore, v also satisfies 
(A— AT)v = Av—- Av=0, 


where J is the identity matrix of order n. Thus, the matrix A — XJ maps v to 0. In 
other words, v is in the null space of A — XA. 

For this reason, A — AJ must be singular (not invertible). Indeed, by contradiction: 
if there were an inverse matrix (A — \J)~!, then we could apply it to the zero vector, to 
map it back to v. On the other hand, from the very definition of matrix-times-vector, 
this must be the zero vector: 
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v= (A-AlD '0=0, 


in violation of the very definition of v as a nonzero eigenvector. Thus, A — AJ must 
indeed be singular, as asserted. Let’s use this to design an eigenvalue for A” as well. 


1.9.3. Eigenvalues of the Hermitian Adjoint 


So far, we looked at the matrix A — AJ. What about its Hermitian adjoint 
A=Ay =4" = Ar 


Well, itis singular as well. Indeed, by contradiction: ifit were nonsingular (invertible), 
then there would exist some nonzero n-dimensional vector u, mapped to v: 


(A" _ 1) u=v. 
This would lead to a contradiction: 
0 = (0, u) = (A— X) v, u) = (v, (A" — A) u) = (v, v) > 0. 
So, A? — I must be singular as well. As such, it must map some nonzero vector 
w 4 0 to the zero vector: 7 
(A’ —XI ) w=0, 


or 


A'w = Aw. 


Thus, \ must be an eigenvalue of A’. In summary, the complex conjugate of any 
eigenvalue of A is an eigenvalue of A’. 


1.9.4 Eigenvalues of a Hermitian Matrix 


Assume now that A is also Hermitian: 
A=A" 
(Sect. 1.6.2). From Sect. 1.7.4, it then follows that 


Av, v) = (v, Av) = (v, Av) = (Av, v) = (av, v) = AQ, v). 
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Now, because v is a nonzero vector, (v, v) > 0 (Sect. 1.7.2). By dividing by (v, v), 
we then obtain 

so A is real: 


AER. 


In summary, a Hermitian matrix has real eigenvalues only. As a matter of fact, in the 
world of matrices, a Hermitian matrix plays the role of a real number. 


1.9.5 Eigenvectors of a Hermitian Matrix 


So far, we’ve discussed the eigenvalues of a Hermitian matrix, and saw that they 
must be real. What about the eigenvectors? What are their algebraic properties? 

To answer this, let u and v be two eigenvectors of the Hermitian matrix A: 

Av=Xv and Au= pu, 
where yp # X are two distinct eigenvalues. What is the relation between u and v? 
Well, it turns out that they are orthogonal to each other. Indeed, thanks to Sects. 1.7.4 
and 1.9.4, 
Lu, v) = pu, v) = (wu, v) = (Au, v) = (u, Av) = (u, Av) = Au, v). 
Now, since ps # A, we must have 
(u,v) = 0, 

as asserted. Furthermore, we can normalize both u and v, to obtain the orthonormal 


eigenvectors u/||u|| and v/||v||. This will be useful later. Let’s see some interesting 
examples. 


1.10 The Sine Transform 


1.10.1 Discrete Sine Wavesx 


An interesting example is the discrete sine wave. To obtain it, just sample the sine 
function (Fig. 1.10). 
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sin(72) 


2 My x 
n+1 n+1 


Fig. 1.10 The smoothest sine wave: sin(zx). To obtain the discrete sine mode, just sample at n 
discrete points: x = 1/(n+ 1),x =2/(n4+ 1), x =3/(n4+ 1),....x=n/(1t+1) 


For a fixed | <j <n, consider the sine mode (or wave, or oscillation) 
sin(j7x). 


Whatis the role of j? Well, j is the wave number: it tells us how fast the wave oscillates. 
In fact, as j increases, the above function oscillates more and more rapidly. For a small 
Jj, the function is smooth, and oscillates just a little. For a greater j, on the other hand, 
the wave oscillates more rapidly: move x just a little, and you may already see an 
oscillation. In other words, if x measures the time, then the wave oscillates more 
frequently. This is why j is called the wave number, or the frequency. 

The above sine mode is continuous: it is a function of every x € R. How to obtain 
the discrete sine mode? For this purpose, do two things: 


e Discretize: sample the original sine mode at n equidistant points between 0 and 1: 


1 2, 3 n 
"ntl 


“n+l? n+l? n+l? 0" 


This produces the n-dimensional column vector 
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e Normalize: multiply by ./2/(n + 1), to produce the new column vector 


This is the discrete sine mode. Thanks to the above normalization, it has norm 1: 
Jv] =1 


(see exercises below). Furthermore, as we’ll see next, the discrete sine modes are 
orthogonal to each other. 


1.10.2 Orthogonality of the Discrete Sine Waves 


Are the discrete sine modes orthogonal to each other? Yes! Indeed, they are the 
eigenvectors of a new symmetric matrix T. T is tridiagonal: it has nonzero elements 
on three diagonals only— the main diagonal, the diagonal just above it, and the 
diagonal just below it: 


2-1 
-1 2 -1 
T = tridiag(—1, 2, -1) = 


-1 2-1 
-1 2 
The rest of the elements, on the other hand, vanish. No need to write them explicitly 
— the blank spaces stand for zeroes. 
As areal symmetric matrix, T is also Hermitian: 
Ta =e 
What are its eigenvectors? The discrete sine modes! To see this, recall the formula: 


sin(@ + ¢) + sin(@ — @) = 2 sin(9) cos(@). 


To study the ith component in v™, set in this formula 
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This gives 


_ (G+ Dir _ (G-Dir : yn ju 
sin {| ———— ] + sin | ———— ) = 2sin | —— ] cos . 
n+1 n+1 n+1 n+1 


By doing this for all components | < i < n, we have 


G — (i) 
TuY = Aju ; 


Aj = 2 —2cos JT) = 4sin? (—/" _), 
n+1 2(n+ 1) 


In summary, T is a Hermitian matrix, with n distinct eigenvalues. From Sect. 1.9.5, 
its eigenvectors are indeed orthogonal to each other. 


where 


1.10.3 The Sine Transform 


Now, let’s place the discrete sine modes in a new matrix, column by column: 
W=(v | v® |---| vw), 
By now, we already know that W is orthogonal. From Sect. 1.8.4, we therefore have 
W'=W=W=Ww 
W is called the sine transform: it transforms each n-dimensional vector u to a new 
vector of the same norm: 
|| Wel] = [Iu 


To uncover the original vector u, just apply W once again: 


u= W(Wu). 


1.10.4 Diagonalization 


Now, let’s place the eigenvalues of T in a new diagonal matrix: 


A= diag(\;)7_, = diag(\1, A2,..., An) = 
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Recall that the columns of W are the eigenvectors of T: 
TW=WA, 


or 
T=WAW!=WAW. 


This is called the diagonal form (or the diagonalization, or the spectral decomposi- 
tion) of T. This is how the sine transform helps diagonalize the original matrix T. 


1.10.5 Sine Decomposition 


The sine transform W is a real, symmetric, and orthogonal matrix: 


W=WWH=W'wa=wi'!weal. 


For this reason, every vector u € C” can be written uniquely as a linear combination 
of the columns of W, with coefficients that are just the components of Wu: 


u = Iu = Wu = W(Wu) = So (Ww)jv. 
j=l 


This way, u is decomposed in terms of more and more frequent (or oscillatory) waves, 
each multiplied by the corresponding amplitude (Wu);. 
For example, what is the smoothest part of u? It is now filtered out, in terms of 
the first discrete wave, times the first amplitude: 
(Wu)yv. 
What is the next (more oscillatory) part? It is 
(Wu)2v, 


and so on, until the most oscillatory term 


(Wu),v™. 


1.10.6 Multiscale Decomposition 


The above can be viewed as a multiscale decomposition. The first discrete wave can 
be viewed as the coarsest scale, in which the original vector u is approximated rather 
poorly. The remainder (or the error) is 
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u— (Wu)v. 


This is approximated by the second discrete wave, on the next finer scale. This 
contributes a finer term, to produce a better approximation of u. The up-to-date 
remainder is then 

u— (Wu)yv — (Wu)ov™. 


This is approximated by the third discrete wave, on the next finer scale, and so on. 
In the end, the most oscillatory (finest) term is added as well, to complete the entire 
multiscale decomposition. This is no longer an approximation: it is exactly uw. 


1.11 The Cosine Transform 


1.11.1 Discrete Cosine Waves 


Likewise, one could also define cosine waves. For a fixed 1 < j < n, consider the 
function 
cos(G — 1)7x). 


To discretize, just sample the above cosine function at n equidistant points between 


O and 1: 
1 3 #5 2n—1 


. o— . ‘ ine ee, 
2n 2n 2n 2n 


This produces a new v: the discrete cosine mode, whose components are now 


a) ye 1 
uv; =cos{{i-—=~}]G—-l-], lsi<n 
2 n 


(Fig. 1.11). The normalization is left to the exercises below. 


1.11.2. Orthogonality of the Discrete Cosine Waves 


To show orthogonality, redefine T at its upper left and lower right corners: 
T11 =Tin = 1 


rather than 2. This way, T takes the new form 
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00s (3) I 


Fig. 1.11 The smoothest (nonconstant) cosine wave: cos(7x). Sample it at n discrete points: x = 
1/(2n), x = 3/(2n), x = 5/(2n), ..., x = (2n — 1)/(2n) 


1 -l 
=1 2 -1 


-1 2 -1 
-1 1 
Although the new matrix T is still real, tridiagonal, and symmetric, its eigenvectors 


are completely different from those in Sect. 1.10.2: they are now the discrete cosine 
waves. To see this, recall the formula: 


cos(@ + @) + cos(@ — @) = 2 cos(@) cos(@). 
In particular, set 
= (:- 3) a- 1)” and 6=G-D-. 
2 n n 


With these @ and ¢, the above formula takes the form: 


. 1). T 2 Nie T 
cos (( + 5) G- p=) + cos (( - >) G- p=) 
_ T T 


Thus, the discrete cosine wave defined in Sect. 1.11.1 satisfies 


Tv? = dv, 
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with the new eigenvalue 
3 as ey ee T 
dj = 2 —2cos (G a 1") = Asin (( = D>) 
n 2n 


Because T is symmetric, its eigenvectors are orthogonal to each other (Sect. 1.9.5). 
This proves orthogonality of the discrete cosine modes, as asserted. 


1.11.3 The Cosine Transform 


We are now ready to normalize: 


y” v 
Je | 


(see exercises below). Let’s place these (normalized) cosine modes in a new matrix, 
column by column: 


W=(v | v® |---| v™), 


This new W, known as the cosine transform, is still real and orthogonal, but no longer 
symmetric: 


wWi=w'=w' sw. 


1.11.4 Diagonalization 


For this reason, our new T is diagonalized in a slightly different way: 
T=WAW !=WAW' 4 WAW, 
where A is now a new diagonal matrix, containing the new eigenvalues: 


A= diag (Aq, A2,.--,An)- 


1.11.5 Cosine Decomposition 


Furthermore, every vector u € C” can now be decomposed uniquely in terms of 
discrete cosine waves as well: 
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w= lu = (WW')u = W(W'n) = Do (W'w)jv®. 


j=l 


This is called the cosine decomposition of u. 


1.12 Positive (Semi)Definite Matrix 


1.12.1 Positive Semidefinite Matrix 


Consider again a general Hermitian matrix A. Assume that, for every (complex) 
vector v, 
(v, Av) = v"Av = 0. 


We then say that A is positive semidefinite. 
In particular, we could pick v to be an eigenvector of A, with the eigenvalue 1. 
This would give 
llvlI7A > 0. 


Thus, all eigenvalues of A must be nonnegative. 

This also works the other way around: if all eigenvalues of a Hermitian matrix 
are nonnegative, then it must be positive semidefinite. Indeed, given a complex vec- 
tor v, just decompose it as a linear combination of the (orthogonal) eigenvectors. 
Fortunately, we’ve already seen an example of a positive semidefinite matrix: T in 
Sect. 1.11.2, whose all eigenvalues are nonnegative. 


1.12.2 Positive Definite Matrix 


Assume now that A is also nonsingular. This means that 0 is no eigenvalue. In this 
case, for every nonzero vector v 4 0, not only 


v'Av > 0 
but also 
v'Av > 0. 


We then say that A is not only positive semidefinite but also positive definite. In 
particular, pick v as an eigenvector of A, with the eigenvalue X. In this case, 


|v|-A > 0. 
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This shows that all eigenvalues of A are positive. 

This also works the other way around: if all eigenvalues of a Hermitian matrix 
are positive, then it must be positive definite. Indeed, given a nonzero vector v, just 
decompose it as a linear combination of the (orthogonal) eigenvectors. Fortunately, 
we’ve already seen an example of a positive definite matrix: T in Sect. 1.10.2, whose 
all eigenvalues are positive. 


1.13. Exercises 


1.13.1 The Cauchy—Schwarz Inequality 


1. Let u and v be n-dimensional (complex) vectors: 
U = (Uy, U2,...,Un)’ and v = (vj, U2,..., Up)’. 
Prove the Cauchy—Schwarz inequality: 
|(u, v)| < lull - Hull, 


or 


n 2 


) UjV; 


i=1 


= |(u, v)? < (u, wv, v) = (x oy (x me). 
i=1 i=1 


Hint: pick two different indices between | and n: 0 < i Aj <n. Note that 


0< |Ujv; _ ujvil? 
=> (u;0j = i0;) (uiv; =. ujv;) 
= UjVjUjV; = UjVjUj Vj = UjVjUjV; + UjVjUj Vj 
2 2 = - = = 2 2 
= |uj|"|vj\> — (aiv;) (ujv;)) — (ujv,;) (uid;) + [uj luil’, 
Nie) 

= = = = 2 2 2 2 

(ujv;) (ujv;)) + (Ujvyj)(uid;) < |usl"|y;° + luj\ lui’. 


Do the same with the plus sign, to obtain 
Givi) GB) + Grp) WuB)| < lui? ly? + Ly Peel’. 
2. Could the Cauchy—Schwarz inequality be an exact equality as well? Hint: only if 


u is proportional to v (or is a scalar multiple of v). 
3. Conclude the triangle inequality: 
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Ju + vil < |u|] + lvl. 
Hint: 


lu + vl? = (utv,u+ov) 
= (u, u) + (u, v) + (v, u) + (Vv, v) 
= lull? + @, v) + (vy, w) + oll? 
< |lull? + 2], v)| + lvl]? 
< full? + lull - ull + lull? 
= (lull + llvfl)?. 


1.13.2. Generalized Eigenvalues and Eigenvectors 


1. 


nABwWNd 


[oe] 


Let A be a square complex matrix of order n. Let B be a Hermitian matrix of 
order n. An n-dimensional vector v is called a generalized eigenvector of A if 


Av=.Bv and (v, Bv) £0. 


The scalar \ is then called a generalized eigenvalue of A. 


. Could v be the zero vector? 

. Conclude that A — AB maps v to the zero vector. 
. Conclude that v is in the null space of A — AB. 

. Is A — AB singular? Hint: otherwise, 


v= (A— AB) '0=0, 


in violation of (v, Bv) £ 0. 


. Is A” — XB singular? Hint: otherwise, there would be a nonzero vector u mapped 


to v: 
(A” - YB) u=v, 
which would lead to a contradiction: 


0 = (0, u) = ((A— AB) v, u) = (v, (A" — AB) u) = (v, v) > 0. 


. Conclude that Ar — \B has a nontrivial null space. 
. Conclude that ) is a generalized eigenvalue of A”. 
. Assume now that A is Hermitian as well. Must the generalized eigenvalue \ be 


real? Hint: 


A(v, Bv) = (v, Bv) = (v, Av) = (Av, v) = (ABv, v) = \(Bv, v) = A(v, Bv). 
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Finally, divide this by (v, Bu) £ 0. 
10. Let u and v be two generalized eigenvectors: 


Av =ABv and Au = pBu, 
where \ 4 yu are two distinct generalized eigenvalues. Show that 
(u, Bv) = 0. 
Hint: 


p(u, Bv) = (Bu, v) 
= ji(Bu, v) 
= (Bu, v) 
= (Au, v) 
= (u, Av) 
= (u, ABv) 
= A(u, Bv). 


1.13.3 Root of Unity and Fourier Transform 


1. Let n be some natural number. Define the complex number 
(24 
w = exp 7 
n 
Show that w can also be written as 
2 2 
w = cos ( “Vay Tsin( =), 

n n 

2. Show that w is the nth root of unity: 


w” = exp” (22) = exp (=4,) = exp(27/—1) = 1. 
n n 


3. Look at Fig. 1.12. How many roots of unity are there in it? Hint: for each 1 < 
j<n, 


(w')" = wi" = w" = (w"! = = 1. 


4. Use w and its powers to design a new n x n complex matrix: 
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imaginary axis 


w? w? 
e e 
w 
e e 
e &- 
real axis 

e e 

e e 


Fig. 1.12 The nth root of unity, and its powers in the complex plane: w, w*, w,... 


W =n? (we-DE-D) 


l<ijs<n* 


5. Let 1 <j <n be fixed. Show that the jth column in W could also be obtained 


cg 


from the exponent wave 
exp(27/—1G — 1)x), 


sampled the at n equidistant points 


and normalized by n!/?. 


Show that the Hermitian adjoint of W is just the complex conjugate: 


W'=W. 


. Show that the first column in W is the constant column vector 


ia a (0a Oe 


Show that this is a unit vector of norm 1. 


Show that this is indeed the first discrete cosine wave in Sect. 1.11.1 (in its 


normalized form). 
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10. 
11. 


12. 


13. 


14. 


15. 


16. 
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Show that this eigenvector has the zero eigenvalue in Sect. 1.11.2. 
Conclude that the tridiagonal matrix T defined in Sect. 1.11.2 is singular (not 


invertible). 


Show that, in the new matrix W defined above, every column is a unit vector of 
norm | as well. 
Show that, for 1 <j <n, the jth column in W sums to zero: 


l-w! ~ Tw 


n n—1 ; 
sngeis 75 1— yw) 1-1 
Yow DED = pi) = = = 0. 
i=1 


i=0 


Multiply the above equation by w9~)/, to obtain 


(1 <j <n). 


n 
Yi wh 0-) = 9 
i=1 


Look at the real part of this equation, to conclude that 


(1 <j <n). 


Hu-n)-a (30-03) 


In the uniform grid in Sect. 1.11.1, sum the squares of sines and cosines: 


(= (('-3)-93)) + (Es ((-3) 9-7) 
“E(w ((-s)o-n) ome (3) 


25% 
i=l 


=n 


(1 <j <n). 
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17. 


18. 
19. 
20. 


21. 


22. 


23% 


Add these two formulas to each other, to conclude that the discrete cosine waves 
in Sect. 1.11.1 have norm 


Jo a[re eh 
Wa ifj > 1. 
Show that the eigenvalues in Sect. 1.11.2 are distinct. 


Conclude that the discrete cosine waves are indeed orthogonal to each other. 
Rewrite the zero column sums in W 


Sw MD=-0 d<j<n) 


in the simpler form 
n—-1 


Yiwi=0 (<j<n), 
i=0 
or 


n—1 


yi wi =-1 (1 <j <n). 


Look at the real part of this equation, to conclude that 


ll ll 
ep itt: 
| 1 = 
cat 

g g 

i) NO 
ZN ™ 
=|$ 3|§ 
VE NS 
See | 

| . 
— ae 
TMi ~~ 

ea 5 

ae NS 
——_~ 

= 
NS 
“eae 


(<j <n). 
Substitute n + 1 for n in the above equation, to read 


oe) ete) 


(l=7 =n). 
In the uniform grid in Sect. 1.10.1, sum the squares of sines and cosines: 
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24. 


25. 
26. 
27. 


28. 
29. 


30. 
31. 


1 Vectors and Matrices 


II 
= 


(l<j<n). 
Subtract these two formulas from each other, to confirm that the discrete sine 
waves introduced in Sect. 1.10.1 are indeed unit vectors of norm 1. 

Show that the eigenvalues in Sect. 1.10.2 are distinct. 

Conclude that the discrete sine waves are indeed orthonormal. 

In Sect. 1.10.2, in the n x n matrix T, redefine the elements in the upper right 
and lower left corners as 


Tia =Th1 =—1 
rather than the original definition 


Ti ,n = Th = 0. 


This way, T is no longer tridiagonal, but periodic: 


2-1 —1 
—-1 2 -1 

T= ee Si ; 

—-1 2 -1 
—l —-1 2 
or 
2 ifi=j 
T,j= 4-1 iffi-j|/=1lorli-jjJ=n-1 


0 otherwise 


(1 < i,j <n). Show that this new T is no longer tridiagonal. 

Show that, in this new 7, the rows sum to zero. 

Conclude that the first column of W (the constant n-dimensional unit vector) is 
an eigenvector of this new T, with the zero eigenvalue. 

Conclude that this new T is singular. 

More generally, show that the jth column of W (1 <j < n) is an eigenvector of 
this new T, with the new eigenvalue 
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32. 


33. 


34. 
35. 


36. 
37. 


38. 
39. 


40. 


41. 
42. 


43. 


44, 


45. 


Aj =2- (w/! + ot) = 2 — 2cos (2 — — = Asin? (=) : 
n 


Use the above to form the matrix equation 
TW=WA, 
where A is now the new n x n diagonal matrix 
A = diag(A, A2,---, An)- 


So far, we’ve seen that T has complex eigenvectors: the columns of W. Does it 
have real eigenvectors as well? 

Design them! Hint: follow the exercises below, one by one. 

For this purpose, look at the jth column of W. Look at its real part. Is it an 
eigenvector of T in its own right? Hint: Yes! After all, T and 4; are real. 

What is its eigenvalue? Hint: ;. 

Look again at the jth column of W. Look at its imaginary part. Is it an eigenvector 
of T in its own right? Hint: Yes! After all, T and Aj are real. 

What is its eigenvalue? Hint: ;. 

Are the above eigenvalues different from each other? Hint: most of them are. 
Only for | < j < n/2 is the (j + 1)st eigenvalue the same as the (n — j + 1)st 
one: 


An—jt1 = 4 sin? (c= a) = Asin? (= _ =) = Asin? (=) = Aj+1- 
n n n 


Conclude that the (j + 1)st and the (n — j + 1)st columns of W are eigenvectors 
of T, with the same eigenvalue. 

Show that these column vectors are the complex conjugate of each other. 
Conclude that their sum is twice their real part, which is an eigenvector of T as 
well, with the same eigenvalue: wer 

Conclude also that the difference between them is 2,/—1 times their imaginary 
part, which is an eigenvector of T as well, with the same eigenvalue: Aj+1. 
Show that those columns of W that correspond to different eigenvalues are indeed 
orthogonal to each other. Hint: T is symmetric (Sect. 1.9.5). 

Show that the (j + 1)st and the (n — j + 1)st columns of W, although having the 
same eigenvalue ;,), are orthogonal to each other as well: 


n 
> DoDD yEeD — Sa pDiyl 


i=1 i=0 


n—1 
= 2 wi-Mi wii 
i=0 
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n—1 
a 3 winrtii 
i=0 
n—1 
_ y- wii 
i=0 


1— wr 
= age! 

1-1 
an 
=0 


(1 sj <n/2). 


46. Use a similar calculation to verify directly that every two columns in W are 


47. 
48. 
49. 


50. 


indeed orthogonal to each other. Hint: for every |< k Aj <n, 


n n— 
s- Dpe-DE-D wU-DE-D _ > we -ViwG-bi 
i=l i=0 


ne 
i 3 wii G-Di 
i=0 


Show that the columns of W are also unit vectors of norm 1. 

Conclude that the columns of W are orthonormal. 

Conclude that W is a unitary matrix. (W is known as the discrete Fourier trans- 
form.) 

Verify that W indeed satisfies 


W'W=WW=WWeH=!I 


and 7 7 
Ww" =WW'=WWel, 


where J is the n x n identity matrix. 
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51. 


52. 


23; 


54. 


55. 
56. 


Multiply the matrix equation 
TW=WA 


by W~! from the right, to obtain our new T in its diagonal form: 
T=WAW. 


Write an efficient algorithm to calculate Wu, for any given vector u € C”. The 
solution can be found in Chap. 5 in [61]. 

Let K = (Kjj)1<i,j<n be the n x n matrix with 1’s on the secondary diagonal 
(from the upper right to the lower left corner), and 0’s elsewhere: 


x. a{) ifitjent 
‘J 10 otherwise. 


Show that K is both symmetric and orthogonal. 
Conclude that 
K? =K'K =I. 


Verify that K? = J by a direct calculation. 
Conclude that K is a projection matrix. 


Chapter 2 ®) 
Vector Product with Applications cree 
in Geometrical Mechanics 


How to use vectors and matrices? Well, we’ve already seen a few important 
applications: the sine, cosine, and Fourier transforms. Here, on the other hand, we use 
matrices and their determinant to introduce yet another practical operation: vector 
product in 3-D. This will help define angular momentum and velocity, and establish 
the relation between them. 


2.1 The Determinant 


2.1.1 Minors and the Determinant 


For a real square matrix, the determinant is a real function: it maps the original 
matrix to a real number. For a complex matrix, on the other hand, the determinant 
is a complex function: it maps the original matrix to a new complex number: its 
determinant. To define the determinant, we must first define the minor. 

Let A be a square matrix of order n > 1. Let 1 <i, j <n be two fixed indices. 
Define a slightly smaller (7 — 1) x (n — 1) matrix: just drop from A its ith row and 
jth column. The result is indeed a smaller matrix: just n — 1 rows and n — | columns. 
This is the (i, j)th minor of A, denoted by A“), 

Thanks to the minors, we can now go ahead and define the determinant recursively. 
If A is very small and contains one entry only, then its determinant is just this entry. If, 
on the other hand, A is bigger than that, then its determinant is a linear combination 
of the determinants of those minors obtained by dropping the first row: 


ai ifn=1 


dew) = bee cpten, det (AC) ifn > 1. 
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This kind of recursion could also be viewed as mathematical induction on n = 
1, 2,3,.... Indeed, for n = 1, det(A) is just the only element in A: det(A) = ay,1. 
For n > 1, on the other hand, the minors are smaller matrices of order n — 1, whose 
determinant has already been defined in the induction hypothesis, and can be used 
to define det(A) as well. This completes the induction step, and indeed the entire 
definition, as required. Later on, we’ll see yet another (equivalent) definition. 


2.1.2 Examples 


For example, let J be the identity matrix of order n. Then, 
det(/) = 1. 


This could be proved easily by mathematical induction. After all, in the above for- 
mula, all minors vanish, except for the (1, 1)st one: the (n — 1) x (n — 1) identity 
matrix. 

For yet another example, let a be some scalar. Then, 


det(al) = a". 


This could be proved by mathematical induction as well. After all, in the above 
formula, all minors vanish, except for the (1, 1)st one. 
Another interesting example is the so-called switch matrix: 


1 


det 1 =. 1, 


Why is it called a switch matrix? Because once applied to a vector, it interchanges 
its first and second components. Again, most of its minors vanish: only the (1, 2)nd 
minor is nonzero—the (n — 1) x (nm — 1) identity matrix. To contribute to the deter- 
minant, it must pick a minus sign. 

As a final example, look at the special case of n = 2. In this case, we have a small 
2 x 2 matrix. Its determinant is 


ga (2) cee 
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Indeed, there are here two nonzero minors: the (1, 1)st minor is just the lower right 
element d. To contribute to the determinant, it must be multiplied y a. The (1, 2)nd 
minor, on the other hand, is the lower left element c. To contribute to the determinant, 
it must be multiplied by b, and pick a minus sign. 

Later on, we’ll interpret this determinant geometrically: the area of the parallel- 
ogram that the columns (or the rows) of the 2 x 2 matrix make in the Cartesian 
plane (this chapter, Sect. 2.3.3). This will be quite useful in special relativity later on 
(Chap. 4). 


2.1.3 Algebraic Properties 


Let’s discuss some general properties of the determinant, to be proved later in the 
book. Let A and B be square matrices of order n. The determinant of the product is 
the product of the individual determinants: 


det(AB) = det(A) det(B). 


(This will be proved in Chap. 14, Sect. 14.6.3.) This is quite useful. For example, 
what happens when two rows in A interchange? The determinant just picks a minus 
sign. For instance, to interchange the first and second rows in A, just apply the above 
switch matrix: 


1 1 
det 1 A | = det 1 det(A) = — det(A). 
1 1 
For this reason, if both rows are the same, then the determinant must vanish: 
det(A) = — det(A) = 0. 

In this case, A must be singular: it has no inverse. Fortunately, this is a rather rare 
case. More often, A has a nonzero determinant and is therefore invertible, as we’ ll 
see below. 

Another useful property is that the transpose has the same determinant: 

det (A‘) = det(A). 


(This will be proved in Chap. 14, Sect. 14.6.2.) Thus, to calculate the determinant, 
we could equally well work with columns rather than rows. 
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ai1 ifn = 1 


det(A) = { (-Lita;,1 det (AG dD) ifn >1. 


This will be useful below. 
Thanks to these two algebraic properties, we also have a third one: if Q is a real 
orthogonal matrix, then 


det(Q) = det (Q") det(Q) = det (Q'Q) = det(/) = 1, 


sO 


det(Q) = 


Later on in this chapter, we’ll pick the correct sign. Before going into this, let’s use 
the determinant of a nonsingular matrix to design its inverse (Chap. 1, Sect. 1.5.3). 


2.1.4 The Inverse Matrix in Its Explicit Form 


If det(A) ¢ 0, then A is nonsingular: it has an inverse matrix. Indeed, in this case, 
det(A) could be used to define A~! explicitly. In fact, each individual element in 
A7! is given in terms of the transpose minor: 


det (AY?) ee 


- i+ 
yey det(A) 7 0 7 = 


To check on this formula, let’s use it to calculate a few elements in A~!A = I. Let’s 
start from the upper left element: 


n 


(a4), , a dX (a!) EN sas ye I) det (a Ya a meee = 
J= 


(Sect. 2.1.3). This is indeed as required. 

Next, let’s check an off-diagonal element as well. For this purpose, let’s design a 
new matrix B as follows. B is nearly the same as A. Only its first column is different: 
it is the same as the second one. This way, both the first and second columns in 
B are the same as the second column in A. Thus, B must have a zero determinant 
(Sect. 2.1.3). Moreover, in B, most minors are the same as in A. So, we could work 
with B rather than A: 


(A'A), = De (A"'),,; @j.2 
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_ 1 : _4y)iti GD . 
Saas 2 1)'*/ det (AY?) ajo 


aa YieIY det (BYP) bj.1 
j=l 


det(A) 


det(B) 
det(A) 
= 0, 


as required. The explicit formula for A~! is called Cramer’s rule. 
For example, consider the case n = 2. In this case, we have a small 2 x 2 matrix. 
Now, if its determinant is nonzero: 


ad — bc £0, 


ab\' 1 ad 
cd} ~ ad—be\-c a }° 


(Check!) Next, we use Cramer’s rule yet more efficiently. 


then its inverse is 


2.1.5 Cramer’s Rule 


It is too expensive to calculate all minors and their determinant. Fortunately, we 
often don’t need the entire inverse matrix in its explicit form. Usually, we are given 
a specific vector 
— t 
v= (li, 02, 03,0505 Un)’ 


and we only need to apply A7! to it. 

For example, let’s calculate the first component of A~!v. For this purpose, let’s 
redefine B as a new matrix that is nearly the same as A. Only its first column is 
different: it is the same as v. This way, we now have 


n 


7 7 1 7 ; : det(B) 
(A 'v), = >; (A Ji a det(A) ee det (AY) vj = det(A)” 
j=l 


jal 


Now, there is nothing special about the first component. In fact, let 1 <i <nbea 
fixed index. To calculate the ith component of A~!v, use a similar approach: redefine 
B as anew matrix that is nearly the same as A. Only its ith column is now different: 
it is the same as v. With this new B, the ith component in A~!v is 
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ly) _ det(B) 
i det(A)’ 


This formula will be useful in applied geometry: barycentric coordinates in 3-D 
(Chap. 9). In this chapter, on the other hand, we use the determinant for yet another 
geometrical purpose: vector product in 3-D, with its physical applications. 


2.2 Vector Product 


2.2.1 Standard Unit Vectors in 3-D 


So far, the determinant was defined algebraically. Still, does it have a geometrical 
meaning as well? To see this, let’s look at two vectors of the same dimension n. 

Can they multiply each other, to form a new vector? In general, they can’t: we 
could calculate their inner product, but this would be just a scalar, not a vector 
(Chap. 1, Sect. 1.7.1). Still, there is one exception: the three-dimensional Cartesian 
space, obtained by setting n = 3. In this space, a vector product could indeed be 
defined. 

For this purpose, define three standard unit vectors: 


i= (1,0, 0)' 
j= (0, 1,0)' 
k = (0,0, 1)’. 


This way, i points in the positive x-direction, j points in the positive y-direction, and 
k points in the positive z-direction (Fig. 2.1). 

These are orthonormal vectors: orthogonal unit vectors. In what way are they 
standard? Well, they align with the x-y-z axes in the three-dimensional space. In 
this sense, they actually make the standard coordinate system that spans the entire 


Fig. 2.1 The right-hand a 
rule: the horizontal unit 

vectors, i and j, produce the 

vertical unit vector i x j =k 
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Cartesian space. Indeed, let 
U = (Uy, U2, U3)’ and v = (vj, v2, v3)! 


be some three-dimensional real vectors in R?. Thanks to the standard unit vectors, 
they could also be written as 


u=uyit uojtu3k and v= vyit voj + v3k. 


This will be useful below. 


2.2.2. Inner Product—Orthogonal Projection 


In the above, we’ve considered two real three-dimensional vectors: u and v. Their 
inner product is just the real scalar 


(u,v) =ulv = u'v = uv, + up. + 4303. 


This is actually a bilinear form: it takes two inputs (or arguments), u and v, to produce 
one new real number: their inner product. 

What is the geometrical meaning of this? Well, this is just an orthogonal projection. 
Consider, for instance, the x-axis, spanned by the unit vector i. What is the inner 
product of v with i? It is just 


(v,) =v, -l+u-0+23-0= vy. 


This is just the x-coordinate of v: the projection of v on the x-axis. In fact, if v makes 
angle 7 with the positive part of the x-axis, then 


_ V1 -_ (v, i) 
OD) = To Tl 


(Fig. 2.2) 


Fig. 2.2. The vector v makes VU 
angle 7 with the positive part 

of the x-axis: 

cos(77) = v1 /|lvl| 


UL x 


58 2 Vector Product with Applications in Geometrical Mechanics 


Fig. 2.3. The vector } makes 
angle 7 with the unit vector i. 
Once 0 projects onto the 
i-axis, we have 

cos(7) = (6, i)/l0]| 


the i-axis 


Now, there is nothing special about the i-direction. Indeed, let’s rotate both v and 
i by a fixed angle, to obtain the new vector 0, and the new unit vector i (Fig. 2.3). 
This kind of rotation is an orthogonal transformation (see exercises below). As such, 
it preserves inner product (Chap. 1, Sect. 1.8.2). Furthermore, the angle 7 is still the 
same as before. Thus, 
(v, i) (3. i) 


Ill loll 


os(7)) 


You could also look at things the other way around. After all, the axis system is 
picked arbitrarily. Why not pick the x-axis to align with the i-axis in Fig. 2.3? This 
way, we'd have the same picture as in Fig. 2.2. Just rotate yourself, and look at it 
from a slightly different angle! 

In either point of view, the inner product remains the same: the orthogonal pro- 
jection of v (or 0) onto the unit vector i (or i). This is good: the inner product can 
now help calculate the cosine of the angle between the vectors. This angle will then 
give a geometrical meaning to the vector product defined below. 


2.2.3 Vector Product 


What do we want from a vector product? 


e Itshould take two inputs (or arguments): the original real three-dimensional vectors 
u and v. 

e Likewise, its output should be a real three-dimensional vector, not just a scalar. In 
other words, once the symbol “x” is placed in between u and v, u x v should be 
a new three-dimensional vector: their vector product. 

e We already know that the inner product (u, v) is a bilinear form: it is linear in both 
u and v. Likewise, the vector product u x v should be a bilinear operator: linear 
in uv and linear in v as well. 

e Interchanging the arguments should just change sign: 


vxXu=—(u Xv). 


e If both arguments are the same, then the vector product should vanish: 
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uxu=0 


(the origin). 
e Take the triplet i, j, and k, and copy it periodically in a row: 


i, j,k, i, j, k, i, j, k, .... 


In this list, each pair should produce the next vector: 


ixj=k 
jxk=i 
kxi=j 


(Fig. 2.1). This is called the right-hand rule. Later on, we’ll motivate it geometri- 
cally as well. 


How to define a good vector product, with all these properties? Fortunately, we can 
use the determinant. After all, thanks to its original definition, the determinant is just 
a linear combination of the items in the first row, which might be vectors in their own 
right: 


i: ak 
u Xv = det Uy U2 U3 
Vy U2 U3 

= i(u2v3 — 4302) — J(u v3 — u3v1) + k(uy v2 — 4204) 


t 
= (U2V3 — U3V2, UZ3V, — Uj V3, U2 — U2V})'. 


What’s so good about this new definition? Well, let’s see. 


2.2.4 The Right-Hand Rule 
In its new definition, does the vector product have the desirable properties listed 


above? Well, let’s check: 


e It is indeed bilinear: if w is yet another real three-dimensional vector, and a and 
G are some real numbers, then 


(au + Bw) x v= atu x v) + Bw x v) 
u xX (av+ Bw) =a(u x v) + Blu x w). 


(Check!) 
e Interchanging rows in a matrix changes the sign of its determinant. Therefore, 
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ij k i j k 
v Xu = det V1 V2 V3 = — det Uy U2 U3 = —(u X v). 
Uy U2 U3 U1 U2 V3 


(Check!) 
e As aresult, a matrix with two identical rows must have a zero determinant. There- 
fore, 
ij k 
uxu=det | | uw, u2 43 = (0,0, 0)' =0. 
uy, U2 U3 


(Check!) 

e The right-hand rule indeed holds. Let’s verify this for the standard unit vectors. 
(Later on, we’ll verify this for more general vectors as well.) For this purpose, 
take your right hand, with your thumb pointing in the positive x-direction, and 
your index finger pointing in the positive y-direction. Then, your middle finger 
will point in the positive z-direction: 


ijk 
i x j = det 100 =k. 
010 


Now, let your thumb point in the positive y-direction, and your index finger in the 
positive z-direction. Then, your middle finger will point in the positive x-direction: 


ijk 
j x k = det 010 =i. 
001 


Finally, let your thumb point in the positive z-direction, and your index finger 
in the positive x-direction. Then, your middle finger will point in the positive 
y-direction: 
ijk 
k x i= det 001 = j, 
100 


as required. 


So far, we’ve considered standard unit vectors only. But what about more general 
vectors, like u and v above? 
Well, let’s focus on one component in uv x v. For instance, what is the z-coordinate? 
It is just 
1 
(u X v)3 = det uy U2 = UjV2 — UQVX. 
Vi V2 
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Fig. 2.4 The right-hand y 
rule: take your right hand, 
and match your thumb to 
(uy, U2), and your index 
finger to (v,, v2). Then, your 


middle finger will point (v1, v2) 
upward, toward your own 
eyes, as indicated by the “©” f (ui, u2) 


at the origin 


Is this positive? To check on this, let’s look at the two-dimensional subvector 
(u;,u2)' € R?. Assume that it lies in the upper half of the Cartesian plane, where 
uz > 0. (Otherwise, just switch to —u) Likewise, look at the two-dimensional sub- 
vector (v;, v2)’ € IR’, and assume that it lies in the upper half of the Cartesian plane, 
where v2 > 0 (Fig. 2.4). 

This way, to check whether wu; v2 — u2v, is positive or not, one could divide it by 
u2U2 > 0. When would this be positive? Only when 


Uu v 
cotan(¢) = oe ie cotan(9), 
u2 v2 
or 
’ <9, 


as in Fig. 2.4. (After all, the cotangent function is monotonically decreasing.) In this 
case, since its z-coordinate is positive, u x v points from the page outward, toward 
your eyes. This is in agreement with the right-hand rule: take your right hand, and 
match your thumb to uv, and your index finger to v. Then, your middle finger will 
point toward your own eyes, as required. 

Finally, the vector product has yet another interesting property: u x v is orthogonal 
to both u and v. Indeed, the inner product of u x v with either u or v produces a 
matrix with two identical rows, with a zero determinant: 


Uy, U2 U3 
(u x v)'u = det Uy U2 U3 =0 
Uy V2 U3 


V1 V2 V3 
(u x v)'v = det Uy U2 U3 = 0. 
V1 V2 V3 


This property will be useful in orthogonal transformations. 
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2.3 Orthogonalization 


2.3.1 Invariance Under Orthogonal Transformation 


The vector product is (nearly) invariant under orthogonal transformation: up to a 
sign, ordering doesn’t matter: you could either apply the vector product and then 
transform, or do things the other way around: first transform, and then apply the 
vector product (Fig. 2.5). 

To see this, let Q be a3 x 3 real orthogonal matrix. Let us show that, up to a sign, 
Q preserves vector product: 


Qlu x v) = +(Qu) x (Qv). 


(The proper sign will be specified later.) 

Let us first show that Q(u x v) is proportional to (Qu) x (Qv), or orthogonal to 
both Qu and Qv. Fortunately, as discussed in Chap. 1, Sect. 1.8.2, Q preserves inner 
product, so 


(Qtu x v), Qu) = (u x v,u) =0 
(Qt x v), Qv) = (u x v, v) = 0. 


This proves that Q(u x v) is indeed proportional to (Qu) x (Qu), as asserted. But 
what about their magnitude? Is it the same? To see this, just take their inner product: 


(Qu x v))' 
((Qu) x (Qu), Q(u x v)) = det (Qu)' 
(Qv)' 
(u x v)'Q' 
= det u'Q' 
v'Q! 
ordering doesn’t mater You = UXY Qu Xv) = (Qu) x (Qu) 
could either apply the 


orthogonal transformation 

and then the vector product, 

or work the other way x x 
around: apply the vector 

product first, and then the 

orthogonal transformation 


ee re) Qu, Qu 
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(u x v)! 
= det ul Q' 


(u x v)! 
= det ul det (Q') 
v! 
= (u x v,u X v) det (Q) 


= (Qu x v), Olu x v)) det (Q) 


_f IO@x vy? if det(Q) =1 
~ | -OW@x v7 if det(Q) = -1. 


So, 
(uxv) if det(Q)=1 
(Qu) x (Qu) = Fes xv) if ee =f, 
In summary, ordering is (nearly) immaterial: making the vector product and then 
the orthogonal transformation is the same (up to a sign) as making the orthogonal 
transformation and then the vector product. 

In particular, vector product is invariant under an orthogonal transformation with 
determinant 1 (rotation). This means that vector product is purely geometrical: it 
is independent of the coordinate system that happens to be used. For this reason, 
the vector product must have a pure geometrical interpretation, free of any tedious 
algebraic detail. To see this, let’s switch to a more convenient axis system, which is 
no longer absolute, but relative to the original vectors u and v. 


2.3.2 Relative Axis System: Gram-—Schmidt Process 


Let us use the above properties to design a new coordinate system in R?. For this 
purpose, assume that wu and v are linearly independent of each other: they are not 
a scalar multiple of each other. In this case, we can form a new 3 x 3 orthogonal 
matrix: 


O= (uv | yp | yer), 


What are these columns Well, they should orthonormalize the original vectors u and 
v: 


1. First, normalize u: 
ci 
||| 


This way, v‘) is the unit vector proportional to u. 
2. Then, as in Sect. 2.2.2, project v on v", and subtract: 
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yp? =v— Cu, v) py). 
This is called a Gram—Schmidt process. Because u and v are linearly independent, 
v) £0. 


By now, we’ve orthogonalized v with respect to u: v is now orthogonal to v. 
Indeed, since v“) is a unit vector, 


(ae, yo) = Go, v— (vo, v) ol) = (v®, v) _ (v®, v) _0. 


3. Next, normalize v® as well: 
(2) 
v 


(2) 
vo? <— ——_., 
lv | 
This way, v" and v® are now orthonormal and span the same plane as the 
original vectors u and v. 
4. Next, define a new vector, orthogonal to both wu and v: 


yO = yO) x y®, 


5. Finally, normalize v®) as well: 


y?) 


(3) 
v : 
lv 
We'll soon realize that this normalization is actually unnecessary: v® is already a 
unit vector. Still, by now, we don’t know this as yet: we only know that v® is a unit 
vector that points in the same direction as v‘)) x v®, 
In summary, O has real orthonormal columns. As such, O is an orthogonal matrix: 


100 
0'o =00'=| 010 
001 


Thus, the vector product is invariant under O. Indeed, in Sect. 2.3.1, the relevant sign 
is the plus sign, and O must have determinant |: 


v) x v® = (Oi) x (Oj) = Ox j) = Ok =v. 


So, there was never any need to normalize v™: it was a unit vector all along. 
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What are v"), v®, and v°) geometrically? A new axis system! v" points in the 
positive direction of the new x-axis, v™ points in the positive direction of the new 
y-axis, and v°) points in the positive direction of the new z-axis. 

Each three-dimensional vector can now be written in terms of these new coordi- 
nates. As an exercise, let’s reconstruct u. Fortunately, u is orthogonal to both v® 
and v®), Thus, the general expansion in Chap. 1, Sect. 1.11.5, reduces to 


3 
u = Iu = (00')u = O(0'u) = Y (Olu) jv = (Olu) pv = {v, u) v) = Jule. 
j=l 


What does this mean geometrically? It means that u is confined to the new x-axis, 
as required. In fact, in the new coordinates, u takes the simple form of 


zal 
O'u= 0 
0 


v, on the other hand, is expanded in two terms: its orthogonal projections on v) and 
on v?), After all, v is orthogonal to v®), but not necessarily to v\, let alone to v™: 


= (O'v)v + (O'v).0™ 

= (v, v) vO + (v®, v) v® 

= (vv) vO + (v®, v= (v®, v) v0) v® 
(ou); v) py + | v— (a, v) yp) | py, 


What does this mean geometrically? It means that v is confined to the new x-y plane. 
In fact, in the new coordinates, v is actually two-dimensional—it has just two nonzero 


coordinates: 
vy 


atv = [ [v— (0% 9) | 
0 
Thanks to this form, we can now redefine u x v geometrically rather than algebraically. 


2.3.3 Angle Between Vectors 


Like O, O' is a rotation matrix: a real orthogonal matrix, with determinant 1. There- 
fore, the vector product is invariant under O% as well, and can be calculated in the 
new coordinates, using the above form. 
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In the new coordinates, the original (algebraic) definition gets very simple. This 
is also apparent geometrically. After all, v contains only two nonzero coordinates: 
the v\-coordinate (the new x-coordinate), and the v)-coordinate (the new y- 
coordinate). In u x v, only the latter coordinate is relevant. The former (which 
is proportional to uw) drops, contributing nothing to u x v: 


uxv=llul|-|jv—(v®, v) v | v® x v® 
v- vi), Vv yD) 
= |u| - llull | ( ) | (3) 
al 


‘ 3 
= ||ull - lvl] sin(q)v, 


where 77 is the angle between u and v in the u-v plane (the new x-y plane): 


(O'v); — (v,v) (u,v) 
lull full Udell ol 


cos(n) = 


(Fig. 2.3). 

So, what is the norm ||u x v||? It is just the area of the parallelogram that u and 
v make in the plane that they span: the new x-y plane. This is a pure geometrical 
interpretation, independent of any coordinate system, and free of any algebraic detail. 
No wonder it is so useful in physics. 


2.4 Linear and Angular Momentum 


2.4.1 Linear Momentum 


The vector product introduced above is particularly useful in geometrical physics. To 
see this, consider a particle of mass m, traveling in the three-dimensional Cartesian 
space. At time f, it is at position 


Later on, in quantum mechanics, we’ll see that this is not so simple. Still, for the 
time being, let’s accept this. 
To obtain the velocity of the particle, differentiate with respect to time: 


#'() 
r=r(t=| y@ | eR’. 
z(t) 
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linear momentum: p 


radial component of p 


nonradial component of p 


r — position of particle 


origin 
Fig. 2.6 At time f, the particle is atr = r(t) € R3, with a linear momentum p= pi(tye R?. This 


momentum could split into two parts: the radial part is proportional to r, whereas the other part is 
perpendicular to r 


To obtain the linear momentum, multiply by m, the mass of the particle: 


p = p(t) =mr'(t) eR’. 


Later on, in special relativity, well redefine the linear momentum more carefully. 
Still, for the time being, let’s accept this. 

Finally, to have the force, differentiate p, to obtain the new vector p’(t). In sum- 
mary, at each particular time ¢, the linear momentum p describes the full motion of 
the particle, telling us how it moves in each and every spatial direction. Still, we'll 
soon see that only two spatial directions are relevant: the third one vanishes. 

Indeed, p will soon split into two orthogonal components. The first will tell us how 
fast the particle gets farther and farther away from the origin, or how fast ||r|| grows. 
This component must be radial: proportional to r, pointing in the same direction as 
r. In fact, it is just the orthogonal projection of p onto the unit vector r/||r||: 


(r, P) r r r 
sh = .P = cos(7))|| p|l_—., 
IIr| | IIr| 


IIr IIr I 


where 77 is the angle between p and r (Fig. 2.6 and Sect. 2.2.2). This radial component 
indeed tells us how fast the particle gets away from the origin, or how fast ||r || grows 
in time, as required. 


2.4.2 Angular Momentum 


Still, this is not the whole story. After all, as time goes by, r may not only change 
magnitude but also change direction. To understand this part of the motion, let’s look 
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at the other component of p, which will tell us how fast the particle rotates about the 
origin. 

Where does this rotation take place? Well, infinitesimally, it takes place in the r-p 
plane: the plane spanned by r and p. In other words, this is the plane orthogonal to 
the vector product r x p. 

So, the second component of p must be nonradial: orthogonal (or perpendicular) 
to the former component. This way, it will indeed tell us how fast the particle rotates 
about a new vector: the angular momentum r x p. 

What is the norm of this vector? We already know what it is: 


Ir x pll = [Ir - Ip ll sin@) 


(Sect. 2.3.3). To rotate about r x p, the particle must make a small (infinitesimal) 
arc in the r-p plane. For this purpose, the second component of p must be tangent to 
this arc, or orthogonal not only tor x p but also to r (Fig. 2.6). In summary, it must 
be proportional to 

(rx p)xr. 


Fortunately, we already know the norm of this vector: 


_ (7 
Il x p) x rl] = Ilr x pll- [ri] sin (5) = IIr x pil - Ir ll = Ill sin@) Ir IP. 


So, to have the second (nonradial) component of p in its properly scaled form, we 


must divide by ||r||?: 
rx p rx p r 


5 a= x —. 
IIr | rll rll 


Indeed, the norm of this vector is 


rxXp 
[Ir ||? 


xr} = sin(7) || pl, 


as required. In summary, we now have the complete orthogonal decomposition of 
the original linear momentum: 


(r, P) rxp (rp) 9r rxp r 
= ae 7 xXr= : x ’ 
IIr"|| IIr | rll (rth IIr"|| II7"|| 


What is so nice about this decomposition? Well, it is uniform: both terms are written 
in the same style. The only difference is that the former term uses inner product, 
whereas the latter term uses vector product. 

Furthermore, both terms are orthogonal (perpendicular) to each other. After all, 
this is how they have been designed in the first place: the former is proportional to 
r, whereas the latter is perpendicular to r. This is why they also satisfy Pythagoras’ 
theorem: the sum of squares of their norms is 


2 
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2 
rxp 


o) 2 
(r, ?) 
= + 
( Ilr | II || 
= cos”(n)|| p||? + sin?(7) || pl? 


= (cos?(n) + sin?(n)) || pll? 
= IPI’, 


rx p r 
x 
Ir Irll 


(,p) or 
rll [rll 


as required. 


2.5 Angular Velocity 


2.5.1 Angular Velocity 


What is momentum? It is mass times velocity. This is true not only in one but also 
in three spatial dimensions. So, to obtain the velocity vector, just divide by the mass 
of the particle: 


m— milr|l? m|[r||? 


Pp =orxp (r, p) 
x 


This is just a special case of a more general (not necessarily orthogonal) decompo- 
sition: 


v=u+u, 
where 
Uu=WxXY, 
and 
w(t) 
w=wuw(t)= | w(t) J € R? 
w3(t) 


is anew vector: the angular velocity. (Don’t confuse it with the other vector w.) This 
way, the particle rotates about w. The norm ||w]|| tells us how fast: by what angle per 
second. By definition, u must be perpendicular to both w and r (Fig. 2.7). 

The angular velocity is time-dependent: w may change in time, not only in mag- 
nitude but also in direction. For simplicity, however, we often look at some fixed 
time. This way, the argument “(+)” may drop. 
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Fig. 2.7 For the sake of v 

better visualization, we w 
assume that the angular 
velocity w is perpendicular 
to r. This way, it points from 
the page toward your eyes, as 
indicated by the “©” at the 
origin. (Don’t confuse w r — position of particle 
with the other vector w!) 


U=WXT 


2.5.2 The Rotating Axis System 


In general, w may point in just any direction, not necessarily perpendicular to r 
or v. (See exercises below.) For simplicity, however, we often assume that w is 
perpendicular to r: 

(w,r) = 0. 


Otherwise, just redefine the origin, and shift it along the w-axis, until obtaining new 
orthogonal w and r. This way, w-r-u make a new right-hand system, rotating around 


the w-axis. In terms of these new coordinates, the particle may “feel” a few new 
forces. 


2.5.3 Velocity and Its Decomposition 


As we’ve seen so far, at time f, the particle rotates (infinitesimally) around the w-axis 
(at a rate of angle ||w|| per second), making an infinitesimal arc. In our velocity 


v=u+u, 


the former term 


= 
lil 


wxXr 
is tangent to this arc. The remainder 


Ww 


v—-U, 


on the other hand, maybe nontangential, and even perpendicular to the arc. 

For the sake of better visualization, we often assume that w and r are perpendicular 
to each other, as in Fig.2.7. This way, w-r-u form a new right-hand system: the 
rotating axis system. 
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Only in our final example does w contain a component parallel to w. In most of 
our discussion, on the other hand, w is orthogonal to w too, and indeed to v too, as 
in Fig. 2.7. In this case, it makes sense to define 


rx p 


WwW 
m||r |?” 


in agreement with the formula at the beginning of Sect.2.5.1. This way, w is radial, 
so w-w-u make the same right-hand system as w-r-u: the rotating coordinate system 
(Sect. 2.5.2). 


2.6 Real and Fictitious Forces 


2.6.1 The Centrifugal Force 


What is the centrifugal force? This is the force that the particle “feels” in its own 
ideal “world”: the rotating coordinate system. 
In general (even if w is not orthogonal to r), the centrifugal force is 


—mw X (Ww XP). 
To illustrate, it is convenient to assume that w and r are orthogonal to each other: 
(w,r) = 0. 


This way, 
lw x rl] = lel - Wr ll, 


and the centrifugal force is radial: 
—mw x (w x r) = mIlw|l?r. 


Still, this is “felt” in the rotating axis system only. In reality, on the other hand, there 
is no centrifugal force at all. Indeed, in the static axis system used in Fig. 2.8, the 
real force is just p’ = mv’. So long as v’ = 0, there is no force at all: Newton’s first 
law holds, and the linear momentum is conserved. As a result, v can never change 
physically: what could change is just its writing style in rotating coordinates. This 
nonphysical “change” is due to the fictitious centrifugal “force”. 

Still, the rotating coordinates are legitimate too, and we might want to work in 
them. In fact, if the particle stayed at the same rotating coordinates (0, ||r||, 0) all 
the time, then it would rotate physically round and round forever. For this purpose, 
a new counterforce must be applied, to cancel the centrifugal force out. 
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m||w||?r — centrifugal force 


U=WXPr 


© 


Ww 


Fig. 2.8 The fictitious centrifugal force: —mw x (w x r). If w and r are orthogonal to each other, 
then it is also radial: m||w||2r 


2.6.2 The Centripetal Force 


How to balance (or cancel) the centrifugal force? For this purpose, let’s go ahead 
and “connect” the particle to the origin by a wire. In rotating coordinates, this reacts 
to the centrifugal force. Indeed, thanks to Newton’s second law, this supplies the 
required counterforce—the centripetal force: 


mw X (w xr). 
If w and r are still orthogonal to each other, then this force is radial as well: 
mw Xx (w x r) = —m||w||?r. 


This is indeed how the wire must react. After all, Newton’s second law must work 
in just any coordinate system, rotating or not. 

In the rotating axis system, this helps cancel the original centrifugal force, leaving 
the particle at the same (rotating) coordinates (0, ||r||, 0) forever. In the static coor- 
dinates, on the other hand, the centripetal force has another physical job: to make u 
turn. 

In fact, the centripetal force pulls the particle toward the origin in just the correct 
amount: it keeps the particle at a constant distance ||r|| from the origin, rotating at a 
constant angular velocity w. This way, the particle makes not only infinitesimal but 
also global arc around the w-axis (Fig. 2.9). In fact, this arc is as big as a complete 
circle of radius ||r||. As a result, the original velocity vector is always tangent to this 
circle: there is no nontangential component anymore: 


v=u and w=0. 
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v=uU 
at later time — same aN vA 
car — centripetal force 


Fig. 2.9 Here, the particle is connected to the origin by a wire. This supplies the centripetal force 
required to cancel the original centrifugal force, and keep the particle at the constant distance ||r|| 
from the origin, rotating at the constant angular velocity w 


m||w||?r — centrifugal force 


constant w © 


2.6.3 Euler Force 


In the above example, the particle rotates at the constant angular velocity w. But what 
if w changed in time? In this case, w might have a nonzero time derivative: 


wi (t) 
w =w'(t)= | w(t) |] £0. 
w(t) 


In the rotating coordinate system, this introduces a new force—Euler force: 
—mu! xr. 


What is the direction of this new force? In general, we can’t tell. After all, w’ could 
point in just any direction. Indeed, as time goes by, the angular velocity may change 
not only magnitude but also direction, making the particle rotate in all sorts of new 
r-u planes. 

Still, for simplicity, assume that w’ keeps pointing in the same direction as the 
original w (Fig. 2.10). This way, w keeps pointing in the same direction all the time: 
it only gets bigger and bigger in magnitude: 


at later time — v=u 
faster rotation 


—mw’ x r — Euler force 
w’ © 
w 
Fig. 2.10 Here, the particle rotates counterclockwise faster and faster, so w’ points in the same 
direction as w. In this case, Euler force pulls the particle clockwise, opposing the original rotation, 
and slowing it down 
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,_ allwll 
eS ae 
t 


What happens physically? The particle rotates counterclockwise faster and faster. 
What could possibly supply the energy required for this? Well, assume that there is 
some angular accelerator that keeps increasing the angle that the particle makes per 
second. This way, the particle keeps rotating counterclockwise in the same r-u plane, 
at a bigger and bigger angle ||w(f)|| per second. Unfortunately, the Euler force pulls 
the particle back clockwise, in an attempt to oppose this motion and slow it down. 
So, not all the energy of the accelerator can go to increasing ||w||: some of it must go 
to canceling Euler force. Indeed, to balance Euler force, the accelerator must waste 
a force in the amount of 


mw! x rl] = mw" - [Ir 


counterclockwise. Only the rest could be invested in accelerating the particle angu- 
larly. 

As aresult, in the rotating axis system, the particle remains at rest. It always keeps 
the same (rotating) coordinates: (0, ||r||, 0). This way, in its own subjective (rotating) 
“world”, it remains effortless, allowing the rotating axis system to carry it round and 
round, faster and faster. 


2.6.4 The Earth and Its Rotation 


The rotating coordinates are not just theoretical. They may also be quite real, and 
easy to work with. Let’s go ahead and use them in practice. 

In Figs. 2.9 and 2.10, the particle makes a closed circle around the w-axis. In fact, 
the velocity is tangent to this circle, with no nontangential component at all: 


v=u and w=0. 


Let us now consider a more complicated case, in which v does contain a nontangential 
component as well: 
v=u+w, where w 40. 


As a matter of fact, we’ve already seen such a case. In Fig. 2.7, however, w might 
seem radial. Here, on the other hand, w makes angle 7) with the horizontal r-axis, 
and angle 7/2 — 7 with the vertical w-axis: 


(w,w) 20 
(Fig. 2.11). Consider, for example, a spaceship standing somewhere in the northern 


hemisphere of the Earth, at latitude 0 < 7 < 7/2. Its nose points upward: straight 
toward the sky. 
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w — straight towards the sky 


a 


spaceship 


eastwards 
U=WxXxr 


latitude 7 


—2mw x w — Coriolis force westwards 


constant w © 
north pole 


Fig. 2.11 A horizontal cross section of the northern hemisphere of the Earth at latitude 0 < 1 < 17/2 
(a view from above). Here, we place the origin not at the center of the Earth but at the center of the 
cross section. As a result, r is horizontal as well: it lies in the cross section. But w (the direction of 
the spaceship) is not: it makes angle 7) with the horizontal r-axis. The Coriolis force pulls the entire 
spaceship westward 


The entire Earth rotates eastward: this is why we see the sun rising from the east. 
For this reason, in Fig.2.11, the angular velocity w points northward: from the page 
upward, straight toward your eyes. 

In reality, w is not quite constant: it changes direction, although very slowly. In 
fact, as the Earth rotates, w rotates too. Why? Because the Earth is not a perfect 
sphere. For this reason, the north pole is not constant: it loops clockwise. This is very 
slow: the loop takes 27, 000 years to complete. This is called precession. 

Besides, the north pole makes yet another (small) loop clockwise. This loop is 
quicker: it takes 1.3 years to complete. Still, it is very small: just ten meters in radius. 
So, both loops can be ignored. 

In Fig. 2.11, we can see a horizontal cross section of the Earth at latitude 0 < 
1 < 7/2 (a view from above, say from the North Star). Recall that the origin can be 
picked arbitrarily. Here, we place it not at the center of the Earth but at the center of 
the cross section. This way, ||r|| is not the radius of the Earth but the radius of the 
cross section. Indeed, r is horizontal: it lies in the cross section in its entirety. 

Initially, at t = 0, the spaceship stands on the face of the Earth, in this cross 
section. As a matter of fact, r is just the location of the spaceship in the cross section. 
This way, the body of the spaceship points obliquely away from the cross section, 
making angle 7) with the horizontal r-axis. 

Later on, at tf > 0, on the other hand, the spaceship will fly away from the Earth. 
Fortunately, the cross section can be extended into an infinite horizontal plane. In 
this plane, r will denote the orthogonal projection (or “shadow’’) that the spaceship 
will make on this plane. This way, r will always be horizontal: it will keep being in 
this plane at all times. 

If 7 = O, then the situation is simple. The cross section meets the equator, and 
its center coincides with the center of the Earth. Since r lies in this cross section, 
it coincides with the radius of the Earth. This way, r is perpendicular to the face 
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of the Earth: it points straight into the sky. Fortunately, the centrifugal force in this 
direction is well balanced by gravity, which supplies the required centripetal force. 

Why isn’t the Earth a perfect sphere? Well, if it were, then some gravity would 
have been lost at the equator to balance the centrifugal force there, as discussed 
above. So, at the equator, gravity would have been a little weaker. This is probably 
how the Earth had evolved in the first place: at the equator, due to weaker gravity, it 
got a bit wide and “fat”. 

If, on the other hand, 7 > 0, then the situation is more complicated. The cross 
section passes not through the center of the Earth but above it. Since r is still hori- 
zontal, it is now shorter than before: 


\|r || = cos(7) (the radius of the Earth) . 


Furthermore, r is no longer perpendicular to the face of the Earth: it also has a new 
component that points southward. 

What is the norm of this new component? Clearly, it is sin(7))||r ||. This produces 
a new centrifugal force in the amount of m||w]|* sin(7)||r || southward, which is not 
balanced by gravity. After all, gravity pulls downward, toward the ground, not north- 
ward. 

Recall that here we work in the rotating axis system: we stand on the face of the 
Earth, unaware of any rotation. Therefore, to us, the above force is real and is truly 
felt. Likewise, the spaceship feels it as well, and its route could be affected, including 
the shadow r it makes on the (extended) cross section. 

Fortunately, the above force can never affect w, which produced it in the first 
place. Can it affect the “shadow” r? Not much: it needs time to act. Therefore, for 
small f, its effect on r is as small as t7. (See exercises at the end of the chapter about 
Newtonian mechanics in [60].) For this reason, it hardly affects the original motion, 
illustrated in Fig. 2.11 at the initial time of t = 0. Still, after a while, the effect may 
accumulate and grow, and should be taken into account. To balance this, the nose of 
the spaceship could point a little obliquely northward, from the start. 


2.6.5 Coriolis Force 


Together with the entire Earth, the spaceship rotates eastward, at a rate of angle ||w]| 
per second. This produces one component of its velocity—the tangential part: 


uU=W Xr. 


Here, though, we work in the rotating axes, spanned by w, r, and u. In rotating 
coordinates, there is no tangential motion at all. 

Now, at time t = 0, the spaceship also obtains initial velocity w upward, straight 
toward the sky. This is liftoff: the spaceship really feels it. Still, at the same time, it 
also feels a new force, which mustn’t be ignored: 
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—2mw X w. 


This is the Coriolis force (Fig. 2.11). 

What is its direction? Well, it must be perpendicular to the entire w-w plane. 
Thanks to the right-hand rule, this must be westward. 

What is the norm of the Coriolis force? Well, this depends on the angle 7/2 — 7 
between w and w: 


5 TT 
2m ||w x w|| = 2m ||w ||-|| wi] sin (5 = n) = 2m ||w ||-|| wl] cos). 


Fortunately, this force doesn’t have much time to act: for small f, its effect on w is 
as small as t. (See exercises at the end of the chapter about Newtonian mechanics in 
[60].) Thus, it hardly affects the motion, illustrated in Fig. 2.11 at the initial time of 
t = 0. Still, as time goes by, it may accumulate, and mustn’t be ignored. To balance 
it, the spaceship should point a little obliquely eastward, from the start. 


2.7 Exercises 


2.7.1 Rotation and Euler Angles 


1. Let Q be ann x n orthogonal matrix. Show that it preserves norm: for every 
n-dimensional vector v, 


| Qui? = (Qu, Qv) = (v, O' Qv) = (v, Iv) = (v, v) = uIP. 


2. Let O and Q be two orthogonal matrices of the same order. Show that their 
product OQ is an orthogonal matrix as well. Hint: thanks to associativity, 


(0Q)'(0Q) = (Q'0') (00) = 9'(0'0)0=A'IQ=O'O=1. 
3. Let 0 < 0 < 27 be some (fixed) angle. Let U be the following 2 x 2 matrix: 


U=U0)= (ao — oO) . 


sin(@) cos(@) 


Show that U rotates the x-axis by angle 6 counterclockwise. Hint: apply U to 
the standard unit vector that points rightward: 


1\ _ (cos(@) 
u (4) ~ ice) . 


The result is the x-unit vector in Fig. 2.12. 
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Y4 


second column of U: y 


& — first column of U 


> 
x 


Fig. 2.12 The orthogonal matrix U rotates the entire x-y plane by angle @ counterclockwise and 
maps it to the new x-y plane 


RP OO MOND 


a 


12. 
13. 


14. 
15. 


16. 


17. 


18. 


. Show that U rotates the y-axis by angle @ counterclockwise as well. Hint: apply 


U to the standard unit vector (0, 1)’. 


. Conclude that U rotates the entire x-y plane by angle 6 counterclockwise. Hint: 


extend the above linearly. For this purpose, write a general two-dimensional 
vector as a linear combination of the standard unit vectors (1, 0)‘ and (0, 1)’. 


. Show that the columns of U are orthogonal to each other. 

. Show that the columns of U are unit vectors of norm 1. 

. Conclude that the columns of U are orthonormal. 

. Conclude that the columns of U span new axes: the x- and y-axes in Fig. 2.12. 
. Conclude also that U is an orthogonal matrix. 

. Verify that U indeed satisfies 


UU =UU' =], 


where / is the 2 x 2 identity matrix. 
Conclude that U' is an orthogonal matrix as well. 
Verify that the columns of U‘ (the rows of U) are indeed orthogonal to each 
other as well. 
Verify that the columns of U‘ (the rows of U) are indeed unit vectors of norm 1. 
Again, interpret U geometrically as a rotation: once applied to a two-dimensional 
vector, it rotates it by angle 6 counterclockwise. Hint: check this for the standard 
unit vectors (1, 0)! and (0, 1)’. Then, extend this linearly. 
Interpret U' geometrically as the inverse rotation: once applied to a vector, it 
rotates it by angle 8 clockwise. 
Interpret the equation U'U = I geometrically. For this purpose, make sure that 
the composition of U' on top of U is just the identity mapping that changes 
nothing. Hint: rotating clockwise cancels rotating counterclockwise. 
Show that 

det(U) = det (U') = 1. 
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19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


Exercises 719 


Introduce a third spatial dimension: the z-axis. This makes the new x—y—z-axis 
system in R?. 

Assume that yet another (right-hand) axis system is also given: the x—y—z-axis 
system. How to map the original x—y—z-axis system to the new x—y—Z-axis 
system? 

To do this, use three stages: 


e Rotate the entire x-¥ plane by a suitable angle w clockwise, until the x-axis 
hits the x-y plane. 

e Then, rotate the entire x-y plane by a suitable angle ¢ counterclockwise, until 
the x-axis matches the up-to-date x-axis. 

e By now, the up-to-date x- and x-axes align with each other. So, all that is left to 
do is to rotate the up-to-date y-z plane by a suitable angle 9 counterclockwise, 
until the y- and z-axes match the up-to-date y- and z-axes, respectively. 


Conclude that to map the original x—y—z-axis system to the original x—y—z-axis 
system, one could use three stages: 


e Rotate the entire x-y plane by angle ¢ counterclockwise. 
e Then, rotate the up-to-date y-z plane by angle # counterclockwise. 
e Finally, rotate the up-to-date x-y plane by angle w counterclockwise. 


The angles ¢, 0, and w are called Euler angles. 

Show that this is a triple product of three orthogonal matrices. 
Conclude that this triple product is an orthogonal matrix as well. 
Write it in its explicit form: 


(ae :) (’ se) ee 2s 


Does this matrix have determinant 1? Hint: it transfers a right-hand system to 
a right-hand system. Furthermore, it is the product of three rotation matrices of 
determinant 1. 

Say this in the terminology of group theory: the special matrices on the right 
(that rotate a particular plane) generate the entire group of general rotations in 
three spatial dimensions. 

Consider a 3 x 3 matrix. Show that interchanging two rows in it changes the 
sign of the determinant. 

Conclude that if the matrix has two identical rows, then its determinant must 
vanish. 

Let u, v, and w be real three-dimensional vectors in R*. Show that 


uxu=9. 


Show that 
uxv=-—(vx 4). 
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32. Show that 

(u, v X w) = det v 
33. Assume also that u, v, and w are linearly independent of each other: they don’t 


belong to a plane that passes through the origin. Take the original triplet uw, v, w, 
and copy it time and again in a row: 


u, Vv, W, U, Vs, W, Uy Vz, W,.... 


In this list, start from some vector and look ahead to the next two. Show that this 
produces the same determinant, no matter whether you started from u or v or w: 


t 


v w 
det vu! = det w! = det ul 
£ ut vi 


Hint: interchange the first and second rows, and then the second and third rows: 


t t 


u v v 
det v! = — det ul = det w' 
w! w' ul 


34. Conclude that 
(u,v X wW) = (v, w Xu) = (W, UX Vv). 


35. Assume that uv, v, and w satisfy the right-hand rule. Show that, in this case, 
(u,v xX w) > 0. 
Hint: recall that u, v, and u x v satisfy the right-hand rule (Fig.2.4). So, to 
complete u and v into a right-hand system, one could add either w or u x v. 
Thus, both w and u x v must lie in the same side of the u-v plane. As in Fig. 2.3, 
they must therefore have a positive inner product: 


(w,u xv) > 0. 


36. Could this serve as a new (algebraic) version of the right-hand rule? 
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2.7.2 Principal Axes 


1. 


bce 


10. 


11. 


Recall that 


lil 
< 


is the position of the particle in the Cartesian space. Assume that r is fixed. Show 
that rr‘ is a3 x 3 matrix. 
Show that rr‘ is symmetric. 
What are the eigenvalues and eigenvectors of rr‘? 
Show that r is an eigenvector of rr’, with the eigenvalue ||r||. Hint: thanks to 
associativity, 

(rr‘) r=r (r'r) =(r,r)r = |[r|[’r. 


. Let g be a vector that is orthogonal to r. Show that q is an eigenvector of rr‘, 


with the zero eigenvalue. Hint: thanks to associativity, 


(rr')qg=r(r'q)=(, qr =0. 


Design two different g’s that are both orthogonal to r and are also orthogonal to 
each other. 


. Show that this could be done in many different ways. Still, pick arbitrarily one 


particular pair of orthogonal q’s. 


. Conclude that rr' is positive semidefinite: its eigenvalues are greater than or 


equal to zero. 
Conclude that rr‘ has the following diagonal form: 


IIr ll? 
rr'=O 0 O'” 
0 


where O is an orthogonal matrix, with columns that are proportional to r and 
the above q’s. These columns are called principal axes. They span the entire 
Cartesian space, using new principal coordinates. 

In terms of principal coordinates, where is the particle? Hint: it is always at the 
same principal coordinates: (||r||, 0, 0). 

Show that there are two principal axes that could be defined in many different 
ways. Still, pick arbitrarily one particular choice. 
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2.7.3 The Inertia Matrix 


1. Let J be the 3 x 3 identity matrix. Define the 3 x 3 inertia matrix of the particle: 


A=A(r) = I[r |? 2 —rr’. 


Show that A is symmetric. 

Show that r is an eigenvector of A, with the zero eigenvalue. 

Show that the above qg’s are eigenvectors of A, with the eigenvalue ||r||?. 
Conclude that A is positive semidefinite: its eigenvalues are greater than or equal 
to zero. These eigenvalues are called moments of inertia. 

6. Show that A has the diagonal form 


tbe at 


0 
A=O \Ir |? O'.” 
IIr |? 
7. Recall that 
Wy 
w= lur]eE R 
W3 


is the angular velocity. To help visualize things better, we’ve assumed so far that 
w was perpendicular to r. This, however, is not a must: from now on, let’s drop 
this assumption. Show that, even if w is no longer perpendicular to r, Aw still is 


(Aw, r) = 0. 


8. Show that 
lw x rll? = flwll* Ir? — @, 7)’. 


Hint: see Sect. 2.3.3. 
9. Conclude that 
lw x rl? = w Aw. 


Hint: thanks to associativity, 


2 21-112 2 

lw x rl" = [wl lIr Il — @, r) 
= |\r|?w'w — (w'r) (rw) 
= |[r|[?w! Tw — w! (rr‘) w 


= w' Aw. 


10. Show that the entire principal axis system rotates about the w-axis, carrying the 
“passive” particle at angle ||w|| per second. 
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2.7.4 Triple Vector Product 


Ww 


. Let v and w be linearly independent vectors in R?: v is not a scalar multiple 


of w, but points in a different direction. Let 7 be the angle between v and w 
(0 < 7 < 7). Let p be yet another vector, perpendicular to v in the v-w plane 
(Fig. 2.13). Show that 


px(vuxw)=(p,w)uv. 


Hint: use the fact that = 
cos (5 + n) = — sin(7). 


. Conclude that 


p xX (w xv) =—-(p,w)v. 


. In the latter formula, interchange the roles of v and w. 
. Conclude that 


qx (vx w)=-(q,v)u, 


where g is perpendicular to w in the v-w plane. 


. Add these formulas to each other: 


(p+ q) xX (UX w) = (p, w)v — (Gq, vw = (p+ q,w)v—(p+q, v)w, 


where p is perpendicular to v and q is perpendicular to w in the v-w plane. 


. Show that every vector u in the v-w plane could be written as u = p+q, 


where p is perpendicular to v and q is perpendicular to w in the v-w plane. Hint: 
because v and w are linearly independent of each other, so are also p and q. 
Geometrically, this exercise is just the parallelogram rule (Fig. 1.2). 


. Conclude that 


u xX (v xX w) = (u, w)v — (u, v) UW, 


where u is just any vector in the v-w plane. 


Uy, 


a 


Fig. 2.13 In the v-w plane, p is orthogonal to v, but not to w 
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angular velocity: w 


m||w||?* — centrifugal force 


origin 


Fig. 2.14 The more general case, in which r is not necessarily perpendicular to w. Let f be the part 
of r that is perpendicular to w. Then, the centrifugal force is m||w||?7 rightward 


8. 


10. 


11. 


12. 


13. 


14. 


15. 


Extend the above formula to a yet more general u that may lie outside the v-w 
plane as well. Hint: the component of u that is perpendicular to the v-w plane 
contributes nothing to either side of the above formula. 


. Use the above formula to obtain once again the orthogonal decomposition of the 


linear momentum at the end of Sect.2.4.2. Hint: set u = v = r (the position of 
the particle) and w = p (the linear momentum). 

Use the above formula to write the centrifugal force (Fig. 2.8) in a more general 
case, in which w and r are not necessarily perpendicular to each other, so the 
rotating axes do not necessarily align with them anymore. Hint: 


—w xX (w xr) = —(w,r)wt (w,w)r 
Pn ee 
= [all ¢ rE ») 
= |lw|?F, 


where F is the part of r that is perpendicular to w (Fig. 2.14). 

Prove the above result in a more geometrical way. Hint: note thatw xr =w xf. 
Therefore, w x (w x r) =w xX (w xP). 

Does the centrifugal force really exist? Hint: only in the rotating axis system! In 
the static axis system, on the other hand, it has no business to exist. Indeed, in 
Figs. 2.6 and 2.7, the velocity is often constant, meaning equilibrium: no force 
at all. 

Does the centripetal force exist? Hint: only if supplied by some source, such as 
gravity. 

What is the role of the centripetal force? Hint: in the rotating axis system, it 
balances the centrifugal force, and cancels it out. This way, there is no force 
at all, so the particle always has the same (rotating) coordinates, and is carried 
“passively” by the rotating axis system round and round forever. In the static 
axis system, on the other hand, the centripetal force has a more “active” role: to 
make u turn. This indeed makes the particle go round and round, as required. 
At the end of Sect.2.5.3, there is a relation between the angular velocity and 
the angular momentum. Extend it to the more general case, in which w is no 
longer perpendicular to r, so the rotating axes no longer align with it. For this 
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16. 


17. 


18. 


purpose, assume that the angular velocity is given. How to uncover the angular 
momentum? Show that, if w in Fig.2.7 is radial, then the angular momentum 
could be obtained from the angular velocity: 


rxp=m-rxv 
=m-rx(ut+w) 
=m-rxu 
=m-rx (wxr) 


=m((r,r)w — (r, w)r) 


(r, w) 
mini? (w- Ire ') 


= mlr||"0, 


where w is the part of w that is perpendicular to r (Fig. 2.15). 
Prove the above result in a more geometrical way. Hint: note thatw xr=wW xr. 
Therefore, r x (uw Xr) =rx(®xr). 
Conclude that, if w in Fig.2.7 is radial, then the angular momentum has a yet 
simpler form: 

rx p=mAvw, 


where A = A(r) is the inertia matrix introduced above. 

Look at things the other way around. Assume that the angular momentum is now 
available. How to define the angular velocity? This could be done as follows: let 
w have some radial component. Define its other component by 


rxp 


w 


m||r||? 


Show that, this way, w in Fig. 2.7 must be radial. 


2.7.5 Conservation of Angular Momentum 


1. 


2. 


In Fig. 2.15, the angular momentum is not conserved! After all, it is proportional 
to w, which changes direction to keep pointing obliquely (upward and inward). 
Why isn’t angular momentum conserved? Hint: only ina closed (isolated) system 
is angular momentum conserved. The particle, however, is not isolated: there is 
an external force acting upon it—a horizontal centripetal force, which makes it 
rotate about the vertical w-axis. 

To supply this centripetal force, introduce a second particle at position 
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angular momentum: mljr||?@ 


angular velocity: w if) va w 


r — position of particle 
origin 


Fig. 2.15 Let be the part of w that is perpendicular to r. If w is radial, then the angular momentum 
is m||r||?&. This is nonconstant: it must change direction to point not only upward but also inward. 
Why isn’t angular momentum conserved? Because the system is not closed or isolated: a horizontal 
centripetal force must be supplied from the outside 


total angular momentum: 
m|r|?(@ + a7) 


A 
angular momentum of angular momentum of 
first particle: m||r||?& second particle: m||r||?a~ 
wW 
a Tg 
second particle: r— r: first particle 
origin 


Fig. 2.16 The particles atr andr~ attract each other just enough to supply the horizontal centripetal 
force required to make them rotate together about the vertical w-axis. This is a closed isolated system: 
no external force acts upon it. This is why the total angular momentum is now conserved: w + w7 
keeps pointing straight upward 


on the other side of the vertical w-axis (Fig. 2.16). Its inertia matrix is A(r7). 
What are its eigenvectors? Could they be the same as those of A(r)? Hint: only 
if r~ is proportional or perpendicular to r. 


2.7 


12. 


13. 
14. 
15. 
16. 


17. 
18. 
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. What are the principal axes of the second particle? Are they the same as those 


of the first particle? 


. In terms of its own principal coordinates, where is the second particle? Hint: it 


is always at the same principal coordinates: (|r|, 0, 0). 


. Let @ be the part of w perpendicular to r~. Show that the angular momentum 


of the second particle is 


ml||r—||°O" = mA(r7)w. 


. Together, these particles make a closed system: for a suitable ||r||, they attract 


each other just enough to supply the horizontal centripetal force required to 
make them rotate together about the vertical w-axis. This is called the two-body 
problem. What is the total angular momentum? Where does it point? Must it be 
vertical? 


. Show that the total angular momentum is now conserved. Hint: the sum w + @7 


always points straight upward, with no inward component anymore. 


. Define the inertia matrix of the two-body system: 


B=B(r)=A(r)+A(r). 
Write the total angular momentum as 


mllr|l? (f@ +07) =m (Aw + Aw) = mBu. 


. Without calculating B explicitly, show that it is symmetric. 
. Conclude that B has three orthogonal eigenvectors. Hint: see Chap. 1, Sect. 1.9.5. 
11. 


Could these new eigenvectors be the same as those of A(r)? Hint: only if r~ is 
proportional or perpendicular to r. 

Show that B is still positive semidefinite: its eigenvalues are greater than or equal 
to zero. Hint: every three-dimensional vector v could be decomposed as a linear 
combination of eigenvectors of A(r) or A(r~). Therefore, 


(v, Bu) = (v, A(r)v) + (v, A(r Jv) 2 0. 


Show that ifr~ ¢ —r, then B is also positive definite: its eigenvalues are strictly 
positive. These are the moments of inertia of the two-body system. 

Design an eigenvector of B that is perpendicular to both r and r~. Hint: if 
r_ #-—r,thentaker xr. 

What is its eigenvalue? Hint: 2\|r ||. 

Still, this eigenvector depends on r. Design yet another eigenvector that doesn’t 
depend on r, and remains the same throughout the rotation. Hint: take the vertical 
vector (0, 0, 1)’. 

What is its eigenvalue? Hint: 2((|r |? — 27) = 2(x? + y?). 

Could it possibly be negative or zero? Hint: no! If it were zero, then r~ = r, 
which is impossible. 
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20. 
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23. 


24. 
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26. 
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Conclude that our vertical w remains an eigenvector of B all the time. 
Conclude that the total angular momentum m Bw remains always vertical. 
Conclude once again that the total angular momentum is indeed conserved. 
Note that this is just a special case of a more general theorem: in a general closed 
system, if (and only if) the angular velocity is an eigenvector of the inertia matrix 
of the system, then it remains constant, unchanged throughout the entire rotation 
around it, and always proportional to the total angular momentum of the system, 
which remains constant as well. 

Use the orthogonal eigenvectors of B to design principal axes for the two-body 
system. 

Could they be the same as those of the first particle alone? Hint: only if r~ is 
proportional or perpendicular to r. 

Which principal axis is independent of r, and remains the same throughout the 
entire rotation? Hint: the vertical z-axis. 

In Fig. 2.8, even if w is not perpendicular to r, show that the centrifugal force 
could be written simply as 


—mw X (w xr) =mAw)r. 


Chapter 3 ®) 
Markov Chain in a Graph ony 


So far, we’ve mostly used small matrices, with a clear geometrical meaning: 2 x 2 
matrices transform the Cartesian plane, and 3 x 3 matrices transform the entire 
Cartesian space. What about yet bigger matrices? Fortunately, they may still have 
a geometrical meaning of their own. Indeed, in graph theory, they may help design 
a weighted graph, and model a stochastic flow in it. This makes a Markov chain, 
converging to a unique steady state. This has a practical application in modern search 
engines on the Internet [44]. 


3.1 Characteristic Polynomial and Spectrum 


3.1.1 Null Space and Characteristic Polynomial 


In this chapter, we'll see how useful matrices are in graph theory. This will help 
design a practical ranking algorithm for search engines on the Internet. Before going 
into this, let’s see some more background in linear algebra. 

Let A be a square (real or complex) matrix of order n. In many cases, one might 
want to focus on the eigenvalues alone. After all, they tell us how A acts on impor- 
tant nonzero vectors: the eigenvectors. How to characterize the eigenvalues, without 
solving for the eigenvectors? 

For this purpose, let J be the identity matrix of order n. Let \ be some (unknown) 
eigenvalue. Then A — AJ maps the (unknown) eigenvector to zero. In other words, 
the eigenvector belongs to the null space of A — AJ (Chap. 1, Sect. 1.9.2). 

So, there is no need to know the eigenvector explicitly: it is sufficient to know 
that A — AJ maps it to the zero vector. This means that A — XJ has no inverse. After 
all, no matrix in the world could possibly map the zero vector back to the original 
nonzero eigenvector. Thus, A — AJ must have zero determinant: 
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det(A — AT) = 0. 


This is called the characteristic equation. It characterizes the eigenvalue \ in a simple 
algebraic way, as required. Even before \ is known, we already know something about 
it: it solves the characteristic equation 


det(A — pI) = 0, 


where jz stands for a general (unspecified) complex number: the independent variable. 
Once the characteristic equation is solved, one could go ahead and solve for the 
eigenvector too, if necessary. 

The left-hand side det(A — J) is called the characteristic polynomial in the 
independent variable ju. So, the original eigenvalue is a root of the characteristic 
polynomial: a special argument, for which the characteristic polynomial vanishes, 
and the characteristic equation is solved. There is at least one root, and at most n 
distinct roots, each with its own private eigenvector. 


3.1.2 Spectrum and Spectral Radius 


Let’s place the eigenvalues in a new set: 
spectrum(A) = {u eC | det(A — pl) =O}. 


This is the spectrum of A: the set of eigenvalues. 
How large could an eigenvalue be in magnitude? This is called the spectral radius: 


p(A)= max pi. 
pe Spectrum(A) 


Clearly, the spectral radius is a nonnegative real number. How large could it be? Well, 
it can’t exceed the maximal row-sum: 


n 
p(A) < max )|a;,j\, 
l<i<n 
j=l 
or the maximal column-sum: 
n 
p(A) < max )°|a;,j|. 
I<j<n¢ ; 
i= 


This is proved in the exercises below. 
Note that p(A) is not necessarily an eigenvalue on its own right. After all, even 
if A is areal matrix, its eigenvalues (and eigenvectors) are not necessarily real: they 
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could be complex as well. For this reason, (A) is not necessarily an eigenvalue: it 
is just the absolute value of a (complex) eigenvalue. 
We already know that, if A is Hermitian, then its eigenvalues are real: 


A=Al > spectrum(A) Cc R. 


(Here, ‘Cc’ means “contained in’”.) In this case, either p(A) or —p(A) must be an 
eigenvalue. Nevertheless, even in this case, the eigenvectors may still be complex. 

Only if A is areal symmetric matrix must it have not only real eigenvalues but also 
real eigenvectors. Of course, the real eigenvector is defined up to a scalar multiple, so 
it could always be multiplied by a complex scalar to produce a comlex eigenvector 
as well. Still, to make your life easy, better design a real eigenvector, and stick to 
it. For this purpose, given a complex eigenvector, just look at its real part (or its 
imaginary part): this is indeed a real eigenvector, easy to use. (See exercises at the 
end of Chap. 1.). 


3.2 Graph and Its Matrix 


3.2.1 Weighted Graph 


What is a graph? It is modeled in terms of two sets: N and E. N contains the nodes, 
and E contains the edges. Each edge is a pair of two nodes. Geometrically, the edge 
leads from one node to the other. 

In a weighted graph, each edge is assigned a nonnegative number: its weigh, or 
the amount it could carry. This way, the edge (j,i) € E (leading from node j to 
node 7) has the weight a;,; > 0. This lets the amount a;, ; to flow from node j to node 
i. If, on the other hand, (j,i) ¢ E, then no edge leads from node j to node i, and 
Qj = 0. 


3.2.2. Markov Matrix 


Let’s index the nodes by the index 
i=1,2,3,...,|NI, 


where || is the total number of nodes. Consider the jth node (1 < j < |N|). How 
much weight flows from it to all nodes in N? Well, assume that this weight sums to 1: 


IN| 


So aij = 1. 


i=1 


This way, q;,; can be viewed as the probability that a particle based at node j would 
pick edge (j, i) to move to node i. In particular, if (j, j) € E, then there is a small 
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circle: an edge leading from node j to itself. In this case, the particle could stay at 


node j. The probability for this is a;,; > 0. 
The weights (or probabilities) can now be placed in a new |N| x |N| matrix: 


A= (21,3) 126,101 . 


This is the probability matrix, or the Markov matrix: its columns sum to 1. Let’s use 
it to describe a discrete stochastic flow. 


3.2.3 Example: Uniform Probability 


Consider those edges issuing from node /: 


outgoing(j)={V,HD EF | ie NICE. 


This way, |outgoing(j)| is the total number of those edges issuing from j. Consider 
a simple example, with a uniform probability: the particle has no preference—it is 
equally likely to move to any neighbor node: 


_ ) outgomey HUD EF 
0 if GD EE. 


Gi, j 


Why is this a legitimate probability? Because the columns sum to 1: 


IN| 


1 
4.3 = |outgoing(/)| 
i=1 {ieN | (j,i)eE} 
= 
a ; outgoing(j 
(j.,deoutgoing(j) Joutgoing(j)| 
— 1 
= |outgoing(j)| —————— 
loutgoing(/)| 
=1. 


3.3. Flow and Mass 


3.3.1 Stochastic Flow: From State to State 


Let’s use the weights (or probabilities) to form a new discrete flow, step by step. The 
flow is stochastic: we can never tell for sure what the result of a particular step is, 
but only how likely it is to happen. 
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So far, we’ve assigned weights to the edges. These weights are permanent: they 
are assigned once and for all, and never change any more. 

Assume now that each node contains a nonnegative mass. These masses are dif- 
ferent from the above weights: they may change dynamically, step by step. 

At the beginning, node j contains the initial mass u; > 0. These masses can be 
placed in a new |N|-dimensional column vector: 


u= (uj, U2, U3, Ley Mn). ‘ 


This is the initial state: the mass distribution among the nodes in NV. 

Next, what happens to the mass in the jth node? Well, in the first step, u; may 
split into tiny bits, each flowing to a different neighbor node: each edge of the form 
(j, i) transfers the amount a; ;u ; from node j to node i. This way, the original mass 
is never lost, but only redistributed among the neighbor nodes: 


IN| IN| 


) aj,juj = uj > Qj,j =Uuj-l=u;. 


i=l i=1 


This is indeed mass conservation. This will be discussed further below. 

In the first step, node j may lose its original mass through outgoing edges. Fortu- 
nately, at the same time, it may also gain some new mass through incoming edges. 
In fact, through each incoming edge of the form (i, j) € EF, it gains the new mass of 
aj,ju; Thus, at node j, the mass has changed from 


IN| 
uj > Yo aj iui = (Au) ;. 


i=l 


This is true for each and every node j € N. In summary, the mass distribution has 
changed from the original state u to the new state Au: 


u— Au. 


This completes the first step. The same procedure can now repeat in the next step as 
well, to change the state from 


Au — A(Au) = A2u, 


and so on. 
The process may then continue step by step forever. Thanks to mass conservation, 
the total mass never changes. 
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3.3.2. Mass Conservation 


What is mass conservation? It means that the total mass remains unchanged. Indeed, 
since the columns of A sum to 1, the total mass after the step is the same as before: 


IN| N| IN| 


S (Au); = << Yo ai ju; 
i=1 


i=l j=l 
N| IN| 


uj. 


The same is true in subsequent steps as well. By mathematical induction, mass is 
preserved throughout the entire process. 

This is indeed an infinite process that may go on and on forever. Does it converge 
to a steady state? To answer this, we must study the spectrum of A. 


3.4 The Steady State 

3.4.1 The Spectrum of Markov Matrix 

Fortunately, the spectral radius of A is as small as 
p(A) = max 2 la; j| = 1 


(see exercises below). Is it exactly 


p(A) = 1? 


3.4 The Steady State 95 


Well, to check on this, let’s look at the transpose matrix A’. We already know that its 
eigenvalues are the complex conjugate of those of A (Chap. 1, Sect. 1.9.3). Therefore, 
both have the same spectral radius: 


p(A‘) = p(A) <1. 


What else do we know about A‘? Well, its rows sum to 1. Therefore, we already have 
one eigenvector: this is the constant | |-dimensional vector 


c=(1,1,1,...,1', 


satisfying 
A’c=c. 


So, for A‘, | is indeed an eigenvalue. As a result, 
pla) = 1. 
Is 1 an eigenvalue of A as well? It sure is. After all, 
{=1 
is an eigenvalue of A as well. Thus, 
p(A) = 1 


as well. 

In summary, both A and A‘ share a common eigenvalue: 1. Still, they don’t 
necessarily share the same eigenvector. For A’, this is the constant vector. For A, on 
the other hand, it could be completely different. Fortunately, we can still tell how it 
looks like. For this purpose, we need a new assumption. 


3.4.2 Converging Markov Chain 


Let’s make a new assumption: our graph is well-connected—no node could drop 
from it. In other words, all nodes in N are important—no node is redundant. Every 
node is valuable—it may be used to receive some mass at some step. Dropping it 
may, therefore, spoil the entire flow. 

Because all nodes may be used to take some mass, N has no invariant subset, from 
which no mass flows away. This means that the flow is global: there is no autonomous 
subgraph, in which the original mass circulates forever, never leaking to the rest of 
the graph. 
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In this case, we say that A is irreducible. This is a most desirable property: it 
guarantees that the infinite flow makes a Markov chain that converges to a unique 
steady state. 

Why is this true? Well, we already know that A has eigenvalue 1. Thanks to the 
above assumption, we can now tell how the corresponding eigenvector looks like: 


e A has a unique eigenvector v of eigenvalue 1: 
Av=v 


(up to a scalar multiple). 
e All its components are positive: 


vj >0, Il<j <|NI. 


e To have this property, v is defined up to multiplication by a positive number. 
e | is maximal—all other eigenvalues are strictly smaller than | in magnitude: 


fu € spectrum(A) => |p| < 1. 
This is indeed the Peron—Frobenius theory [73]. 
Thanks to these properties, the infinite flow makes a Markov chain that converges 


to a unique steady state. Even if the mass is initially concentrated in just one node, 
it will eventually get distributed globally among all nodes. 


3.4.3 The Steady State 


To design the steady state, let uw be the initial state. Let us write 
u=vu+u, 


where v is the unique eigenvector corresponding to 1 (Sect. 3.4.2), and the residual 
(or remainder, or error) w is a linear combination of the other (pseudo-) eigenvectors, 
associated with eigenvalues smaller than 1 in magnitude. This way, as the process 
progresses, we have 

|A” w|| > n+00 9, 


so 
A"u= A"(v+w)=A"V+ A"w=V+tA"W Py soo V. 


Thus, the infinite flow converges to the steady state v. 
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We’re not done yet. After all, v is not defined uniquely. In fact, although v contains 
positive components only, these components are defined up to multiplication by a 
positive constant. How to specify v uniquely? 

Fortunately, thanks to mass conservation, we can tell the total mass in v: 


This determines v uniquely, as required. Moreover, this also shows that the error w 
must have zero total mass. After all, unlike u and v, w could contain not only positive 
but also negative masses, which cancel each other, and sum to zero. 


3.4.4 Search Engine in the Internet 


How to use a Markov chain in practice? Well, let’s use it to model a communication 
networks, such as the Internet. Each site is considered as an individual node. A link 
from one site to another makes an edge. This way, each site may contain a few links: 
its outgoing edges. For simplicity, assume that the probability to click on such a link 
is uniform, as in Sect. 3.2.3. 

You, the surfer, play here the role of the particle. Initially, you are at the kth site, 
for some 1 < k < |N|. While surfing, you may use a link to move to another site. 
Still, if this is a good site, and you are likely to stay there for long, then a;,,, could 
be nearly 1. 

In stochastic terms, you must eventually approach the steady-state: v. How likely 
are you to enter the /th site (1 < / < |N|)? Just look at the positive component v,, 
and see how large it is. 

Still, you are not the only one. Like you, there are many other surfers. Thus, mass 
could stand here for the number of surfers at a particular site. The initial mass wu tells 
us the initial distribution of surfers. The steady-state v, on the other hand, tells us the 
“final” distribution: how likely are the surfers to enter a particular site eventually. 

Now, suppose that you are interested in some keyword, and want to search the 
web for those sites that contain it. Still, there are many correct answers: many sites 
may contain the same keyword. How should the search engine order (or rank) the 
answers? This is called the ranking problem. 

For simplicity, assume that the search engine knows nothing about your surfing 
habits (which is highly unlikely these days). How should it rank the answers to your 
search? Well, it should start from the maximal component in v. If it indeed contains 
the keyword, then it should be ranked first. After all, this must be a popular site, in 
which you must be interested. Next, rank the second maximal component in v, and 
so on. 

Still, the web is dynamic, not static. Every day, new sites are added, and old 
sites drop. For this reason, a good search engine should better update A often, and 
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recalculate uv often. This is indeed an eigenvector problem. (Later on in the book, 
we’ll present it in a more general form.) Fortunately, the positive constant that may 
multiply v is immaterial: it has no effect on the ranking. 


3.5 Exercises 


3.5.1 Gersgorin’s Theorem 


1. Let A be ann x n (complex) matrix. Let \ be an eigenvalue, associated with the 


eigenvector v: 
Av= Xv, v0. 


Pick a specific i, for which v; is a maximal component in v (in absolute value): 
lujl<luil, Lsj<n. 


Show that 
|u;| > 0. 


Hint: otherwise, v = 0, which is impossible for an eigenvector. 
2. For the above i, show that the eigenvalue is not too far from the main-diagonal 


element: 
Pads Do laud: 


Isjsn, j#i 
This is Gersgorin’s theorem. Hint: 


|A — a:| lus] = |(A = ai) v;| 


Y Qj, j Vj 


l<jsn, j#i 


de laseil 


Is<jsn, jA#i 


de lal lel 


l<j<n, jAi 


ye |ai,j| - vil 


Is<jsn, jA#i 


|v; | a lai; |. 


I<j<n, j#i 


IA 


IA 
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Now, divide by |v;| > 0. 
. For the above i, conclude that the eigenvalue is as small as the row-sum (in 


absolute value): 
n 
IA < 0 lai,jl- 
j=l 


Hint: use Gersgorin’s theorem: 


|A| — lal S |A — a;,i| = > |ai,;|- 


I<j<n, j#i 
. Conclude that 
n 
p(A) < max ) Ia; |. 
j=l 


l<i<n¢ 


Hint: the above could be done for each and every eigenvalue. 
. Do the same for the Hermitian adjoint. Conclude that 


p(A) = p(A") < max lai, jl. 
aimee — | 


. Assume now that A is a Markov matrix. What does this mean? Hint: its elements 
are positive or zero, and its columns sum to 1. 

. Must A have a real spectrum? Hint: no—A might be nonsymmetric. 

. What about the transpose matrix, A’? Must it be a Markov matrix as well? Hint: 
the rows of A not necessarily sum to 1. 

. Show that 

p(A’) <1. 


Hint: use Gersgorin’s theorem above. 
. Does this necessarily mean that 


p(A’) = 1? 


Hint: only if you could find at least one eigenvalue of modulus 1. 


11. 
12. 


13. 


Design an eigenvector for A‘. Hint: the constant vector c. 
Show that 

A’c=c. 
Conclude that 


1 € spectrum(A’). 
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14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


3. Markov Chain in a Graph 


Conclude that 
p(A') = 1. 


Conclude that 
1 € spectrum(A) 


as well. Hint: see Chap. 1, Sect. 1.9.3. After all, the complex conjugate of 1 is 1 
as well. 
Conclude that 

p(A) =1 


as well. 
Conclude that the spectrum of A lies in the closed unit circle in the complex 
plane: 


spectrum(A) C {ze C | [z| < 1}. 

Let v be the eigenvector of A satisfying 
Av=v. 
Prove that v indeed exists, and that v 4 0. Hint: we’ve already seen that | is an 
eigenvalue of A. 
Must v be the constant vector? Hint: the constant vector is an eigenvector of A’, 
but not necessarily of A. 
Assume that A is also irreducible (Sect. 3.4.2). What can you say now about v? 
Hint: up to a scalar multiple, v is unique, and has positive components only. 
What can you say now about the spectrum of A? Hint: | is the only eigenvalue of 
magnitude 1: all other eigenvalues are strictly smaller than | in absolute value: 
f/€ spectrum(A), uw 41 > |p| < 1. 

Could A have an eigenvalue of the form 

exp(6/—1) = cos(@) + sin(@)V—1 
for any 0 < @ < 27? Hint: no—this complex number has magnitude 1, but is 
different from 1. 
Conclude that the spectrum of A lies in the open unit circle, plus the point 1: 


spectrum(A) Cc {z EC | |z| < 1} U {I}. 


(Here, ‘U’ means a union with the set that contains one element only: 1.) 


3.5 


24. 


25. 
26. 


27. 
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Use the above properties to prove that the Markov chain indeed converges to a 
steady-state. Hint: see Sect. 3.4.3. 

Why is the steady-state unique? Hint: mass conservation. 

How could this help design a good search engine on the internet? Hint: see 
Sect. 3.4.4. 

How often should the search engine update A, and solve a new eigenvector 
problem, to update v as well? 


Chapter 4 ®) 
Special Relativity: Algebraic Point cree 
of View 


To model static shapes in the plane, the ancient Greeks introduced Euclidean geome- 
try. To model motion, on the other hand, Newton added a new time axis, perpendicular 
to the plane. This may help model a force, applied to an object from the outside to 
accelerate its original motion. 

This fits well in Plato’s philosophy: to think about a general concept, we must 
introduce a new word in our language, to represent not only one concrete instance 
but also the godly spirit behind all possible instances. Likewise, a Newtonian force 
acts from the outside, to give life to a static object. 

Einstein, on the other hand, returned the time dimension back into the very heart 
of geometry. This way, time is not different from any other spatial dimension. Once 
the time axis is united with the original spatial axes, we have a new four-dimensional 
manifold: spacetime. 

This is more in the spirit of Aristotle’s philosophy. A word takes its meaning not 
from the outside but from the very inside: the deep nature of the general concept it 
stands for. 

To introduce special relativity [18, 78], matrices are most useful. Indeed, to rep- 
resent a Lorentz transformation, just use a small 2 x 2 matrix. This may improve on 
Newtonian mechanics, and provide a more accurate way to add velocities. 

Thanks to this new matrix, we also have a new (relative) definition of energy and 
momentum. This is how true physical quantities should indeed be defined: completely 
independent of the coordinate system that happens to be used. To transform from 
system to system, just use the same 2 x 2 Lorentz matrix. 
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4.1 Systems and Their Time 


4.1.1 How to Add Velocities? 


In Newtonian mechanics, velocities are added linearly. Consider, for example, a 
particle that travels at the constant velocity v at some constant direction, while another 
particle travels at the constant velocity u at the opposite direction (Fig. 4.1). These 
velocities are not absolute: they are only relative to our lab. 

Still, our lab is not an absolute reference point: it is not static, but dynamic as 
well. After all, it is on the earth, which travels around the sun, which travels around 
the center of our galaxy: the Milky Way. Fortunately, this underlying motion could 
be disregarded. After all, we don’t feel it at all. In fact, we are only interested in the 
velocities of the particles with respect to our lab. For this purpose, we may assume 
that the lab is at rest. 

Better yet, we are even more interested in the relation between the individual 
particles: how fast do they get away from each other? For this purpose, we better 
eliminate the lab altogether, and look at one particle from the other. 

How fast does the first particle get away from the second one? In Newton’s theory, 
the velocities add up, so the answer is simply u + v. For small velocities, this indeed 
makes sense. But what happens when v is as large as the speed of light c? In this case, 
the sum c + u would exceed the speed of light, which is impossible, as is evident 
from experiment. 

Fortunately, velocities are added nonlinearly: the accurate answer is not u + v but 


rather 
u+uv 


1+4 


(Fig. 4.2). Clearly, so long as both uv and v are moderate, this is nearly the same as u-++v. 
This is why Newton’s theory works well in practical engineering problems. Still, 
strictly speaking, it is not quite accurate. At high velocities, this subtle inaccuracy 
becomes crucial. 


second particle e «— lab ——~> e first particle 
u v 


Fig. 4.1 In our lab, the first particle moves rightwards at velocity v, while the second particle moves 
leftwards at velocity u 


second particle e —————— __ e first particle 


utv 
1+uv/c2 


Fig. 4.2 Away from the second particle, the first particle moves at velocity (u + v)/(1 + uv/c?), 
notu+v 
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4.1.2 Never Exceed the Speed of Light! 


Thanks to the above formula, we can now be consistent, and keep a universal rule: 
never exceed the speed of light! Indeed, assume that, with respect to the lab, both 
particles don’t exceed the speed of light: 


lu| <c and |v| <c. 


Then, with respect to each other, the particles don’t exceed the speed of light either. 
To prove this algebraically, we must first scale the velocities properly. 

The original velocities are not scaled right yet. To be scaled better, they should 
be divided by the speed of light: 


B=" and a. 
c 


Cc 


Einstein’s law says that c is the same in all systems. Furthermore, no speed could 
ever exceed c: 


Gul < 1 and |By| <1. 


Furthermore, assume that both wu and v are strictly smaller (in magnitude) than the 
speed of light: 
jul <c and |v| <c, 


so 
[Bul < 1 and |(,| < 1. 


In this case, 


1— 6, — By + Buby = A — GB.) A — By) > 9, 


so 
By + By < 1+ Buby, 
or 
Put Bo 4 
1+ BuBo 


Moreover, if both u and v change sign, then the above is still valid. Therefore, we 


also have 
Pu + Bv| _ Wet Gol 
1+ Bu Bo 1+ Buby 


By multiplying both sides of this inequality by c, we also have 


106 4 Special Relativity: Algebraic Point of View 


This is a good result: our new formula for adding velocities is consistent! Indeed, 
with respect to each other, the particles never exceed the speed of light, as required. 

So far, we’ve assumed that no particle is as fast as light. What happens when one 
of them is? In this case, things may get strange. Consider, for instance, the following 


extreme case: 
v=c and —c<u<c. 


In this case, rere ie 


Lee ae 


This may still make sense: the first particle is so fast that it can no longer distinguish 
between the lab and the second particle. It views them as one and the same thing, 
left behind at the speed of light. 

Still, there is a yet stranger case: 


v=c and u=-—c. 


In this (singular) case, the second particle is as fast: it follows the first particle at the 
speed of light as well. In these extreme circumstances, our rule is no good: it divides 
zero by zero. 

Fortunately, this case is degenerate and uninteresting: at the initial time of t = 0, 
both particles are at the origin, and have the same speed: c. As such, they are no longer 
distinguishable: they must be one and the same particle. In this sense, although the 
mathematical model fails, physics is still valid. 


4.1.3 How to Measure Time? 


In our original lab, consider the Cartesian coordinates x, y, and z. Assume that, at 
the initial time ¢ = 0, the first particle lies at the origin (0, 0, 0). From there, it starts 
moving rightwards, toward (1, 0, 0). 

Because there is no external force, the particle never accelerates or changes direc- 
tion. This models a motion in some fixed direction. After all, the x-axis could have 
been picked to fit the original direction. This is actually a one-dimensional motion: 
both the y and z coordinates take no part in the motion, and remain the same in all 
systems. Thus, they could be ignored and dropped: we “live” in the x-t plane only. 

How should the time t be measured? Well, in a standard clock, a time unit is often 
measured in terms of length. For example, in one second, the long hand in the clock 
makes an angle of 6°: 1/60 of a complete circle. 

In the present context, on the other hand, we might want to use a linear (rather 
than circular) clock. For this purpose, just use a light beam. In each second, the light 
advances one more light second. This tells us that one more second has passed. 
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This way, time has been scaled by c. Instead of the original time variable t, mea- 
sured in seconds, we actually use the length variable ct, measured in light seconds. 
After all, while ¢ increases by one second, ct advances by one light second. 

Thus, the mysterious time variable ¢ is better realized by the new length variable 
ct, in a new axis: the ct-axis. This way, instead of the x-t plane, we actually focus 
on the new x-ct plane. 


4.1.4 The Self-System 


The original coordinates x, y, and z tell us the position in the lab. Similarly, ¢ tells 
us the time, as measured in the lab. Later on, we’ll see that these measurements are 
relevant in the lab only. In other systems, on the other hand, they might be different. 

For example, the particle also has a self-system that travels with it in the same 
direction and at the same speed: v. This system has new (prime) coordinates: x’ 
and t’. This prime has nothing to do with differentiation: ¢’ just tells us how much 
time has passed since the initial time tr’ = 0, as measured by a clock carried by the 
self-system, traveling at speed uv with respect to the lab. Likewise, x’ tells us our 
position with respect to the particle: how far we are from it. If we are to the right of 
the particle, then x’ > 0. If, on the other hand, we are to the left of the particle, then 
x’ < 0. After all, at all times t’ > 0, the particle remains at the same position relative 
to itself: x’ = 0. As before y’ = y and z’ = z remain irrelevant, and can be ignored. 

How to measure the time in the self-system? As before, better use not t’ but ct’. 
Initially, at r’ = t = 0, the particle lies at the origin in both systems: 


x oe 0 
7) | CAO 
z] zy [0 
ct ct’ 0 


At any later time, on the other hand, the systems may start to differ, not only in terms 
of position but also in terms of time: t’ may differ from t. Time is relative: it depends 
on the system where it is measured. 


4.2 Lorentz Transformation and Matrix 


4.2.1 Lorentz Transformation 


In the self-system, the particle is at rest at 
x =0. 


In the lab, on the other hand, the particle moves at speed v, so it is at 
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x v 
x= —t=vt=-ct = f,ct, 
dt Cc 
where 
= vi dx 
°c d(ct) 


is the scaled velocity, obtained from differentiation with respect to the scaled time 
ct. 

Consider now a more general point at a fixed distance x’ from the particle. So, in 
the self system, it is at rest at position x’. In the lab, on the other hand, it moves at 
speed v, so it is at 

x=x'+vt=x' + Byct. 


In other words, 
x’ =x—By,ct. 


Let’s use this to transform the original x-ct lab coordinates to the new x’-ct’ self 
coordinates. This transformation must be invariant: insensitive to interchanging x 
and ct. After all, both measure distance: x measures the distance from the lab’s 
origin, whereas ct measures the distance made by a light beam (Sect. 4.1.3). Why 
not interchange them? 

Once the dummy coordinates y’ = y and z’ = z are dropped, things get clearer: 
we obtain the new Lorentz transformation that transforms x and ct into x’ and ct’: 


x x! _ 1 —By x 
(21) > (er) =76(2, 1) (). 


where 


is picked to make sure that the determinant is | (Chap.2, Sect.2.1.2). This is the 
Lorentz transformation that gives the self coordinates in terms of the lab coordinates. 

The Lorentz transformation is given in terms of a symmetric 2 x 2 matrix. This 
way, time and space indeed relate symmetrically to each other. In principle, time is 
not different from any spatial dimension. 


4.2.2 Lorentz Matrix and the Infinity Point 


Lorentz transformation preserves area: thanks to the coefficient y, the Lorentz matrix 
has determinant 1. 
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What happens when || is as large as c? In this case, y is no longer a number: it 
is the infinity point oo. Still, it is assumed that 


0-co=0. 
We’ll return to this point later. 


Let’s look at another extreme case: v = 0. In this case, Lorentz matrix is just the 
2 x 2 identity matrix: 


This looks rather boring. Still, it may help formulate the Lorentz matrix in the general 
case as well. 
For this purpose, define also the 2 x 2 matrix 


The original Lorentz matrix can now be written as 


7 (By) Ud — B,J). 


This will be very helpful later on. 


4.2.3 Invariance 


Fortunately, the Lorentz matrix commutes with J: 


J¥(Bv) d -— Bd) = ¥ (Gv) Ud — Bo) J. 


Therefore, the Lorentz transformation is invariant under interchanging x and ct: 


(s) = #(2) 


> 7 (Bo) E = BoJ) J ba 


Jy (By) T= BS) (:) 
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In other words, the Lorentz transformation is blind (completely insensitive) to inter- 
changing x and ct, as required. 


4.2.4 Composition of Lorentz Transformations 


The Lorentz transformation may also help explain the rule of adding velocities 
(Sect.4.1.1). For this purpose, note that every two Lorentz matrices commute with 
each other. 

Consider the composition of two Lorentz transformations. What does this mean 
geometrically? Well, let’s look at our first particle, moving rightwards at speed v. 
The second particle, on the other hand, moves leftwards at speed u, or rightwards 
at speed —u. These velocities are with respect to our original lab, which is assumed 
to be at rest. 

Fortunately, we can also look at things the other way around: from the perspective 
of the second particle, the entire lab “moves” rightwards at speed u. This way, the 
second particle is now assumed to be at rest. How does the first particle travel away 
from it? 

To calculate this, we need to compose two motions: the motion of the first particle 
with respect to the lab, on top of the “motion” of the entire lab away from the second 
particle. In other words, we need to compose two Lorentz transformations, or just 
multiply two Lorentz matrices. Since 


Si, 
the product is 


(Bu) T= But) ¥ (Gv) T= Bo J) = ¥ (Bu) ¥(Bv) T — But) ZT — Bo J) 
= Y (Bu) ¥ (Bv) (I — Py J — BJ + BuByJ’) 
= 7 (Bu) ¥ (Gv) CA + BuBv) I — (Bu + By) J) 


_ But Be 
= (Bu) Y (By) A + BuBo) (1 142.8. + BuBo 1) 


a Bu + By Bu + By 
=%7 I J). 
1+ BuBy 1+ By By 
After all, each Lorentz matrix has determinant |, so the product must have determi- 
nant | as well (Chap. 2, Sect.2.1.3). 


This composition describes the total motion of the first particle away from the 
second one. The total velocity is, thus, not uw + v but rather 
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Bu + Bv u+uv 


c = ’ 
1+ 6,8 1+%9 


as asserted in Sect. 4.1.1. 


4.2.5 The Inverse Transformation 


Let’s look at the special case in which u = —v: the second particle coincides with 
the first one. In this case, the above composition takes the form 


¥ (Bs) I — Bvt) (By) U = Bod) =71(S* ) (1 ran 1) 


= y(O)T 
=]. 


Thus, the inverse transformation is represented by the inverse matrix: 


7 (B-») Ud —-6Gy»/J)= 7 (By) U+fh,J). 


This is a legitimate Lorentz matrix as well: it has determinant 1, as required. 

What is this geometrically? Well, with respect to the first particle, the entire 
lab “moves” leftwards at speed v, or rightwards at speed —v. Thus, the inverse 
transformation may differ from the original one in sign only: replace v by —v. This 
way, the inverse transformation indeed transforms the self coordinates back to the 
original lab coordinates, as required. 

In other words, the inverse transformation considers the particle to be at rest, 
and the entire lab as moving at speed —v away from it. This is why the inverse 
transformation only picks a minus sign: it uses —v rather than v. 

The inverse Lorentz matrix could also be obtained from Cramer’s formula in 
Chap. 2, Sect. 2.1.4: 


b =6\ 01 1B) 3 1 8, 
(<3 T) ~ralat)-7(;,7)- 
Indeed, just divide both sides by y(,), and you obtain the same inverse matrix as 
before. 
Later on, in the book, we’ll see a new theory: group theory. We’ll then realize that 
the subgroup of Lorentz transformations is represented by (and indeed isomorphic 
to) the subgroup of Lorentz matrices: they mirror each other, and are algebraically 


the same (Chap. 5, Sect. 5.1.4). Furthermore, both subgroups are also homeomorphic 
to the open interval (—1, 1): 


7 (Gv) U — BJ) > By € (-1, 1). 
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4.3 Proper Time in the Self-System 


4.3.1 Proper Time 


In its self system, the particle is always at x’ = 0. Imagine a tiny clock embedded 
inside the particle. In the self-system, what would be the time read from this clock? 
This is the proper time of the particle: 


a 
Ill 
~ 


Fortunately, this time could be calculated not only from the self system but also from 
any other system, such as our lab. 
In the x’-t’ self-system, the tiny clock is at 


(x’, t') = (0,8). 


Let’s use this to form the matrix 


ct x'\ feces 0) | et 
x ct) \Oces} 
Clearly, this matrix has determinant c”s. Let’s apply the inverse Lorentz matrix to 


it. Let’s do this column by column. Let’s start with the second column: the inverse 
Lorentz matrix transforms it back to the x-t lab coordinates: 


i) - vBE+ 5.9) (2) = (.): 


Fortunately, the inverse Lorentz matrix commutes with J (Sect.4.2.3). Therefore, 
the first column transforms in a similar way: 


i =J i) > Fy (By) T+ BS) cy =J (3) = iS), 


In summary, the entire matrix transforms to 


é 9 = 7 (bv) +n (S : ) = (I+ By J). 


x ct x’ ct’ 


Since the inverse Lorentz matrix has determinant 1, the above matrix still has deter- 


minant c*s?: 


er can 7 = cs", 
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ct 


C82 


CS. 


Fig. 4.3. The proper time of the lab is just ¢. In the lab, it can be read from a static clock: ty, f2, .... 
This is the maximal proper time. A clock moving at the constant speed of x/t = v, on the other 
hand, has a shorter proper time: i = $s, < ft), then ti = 52 < fo, and soon 


So, we’ve managed to calculate the proper time not only from the self-system but 
also from the lab: 


In these new terms, what is ¢? It is the proper time of the lab: the time read from 
a static clock in the lab. After all, this is how t was defined in the first place. Still, 
it has yet another (mathematical) meaning: the maximal proper time of any particle 
(Fig. 4.3). After all, every moving particle would have a shorter proper time. Indeed, 
at any time f, the particle would be at x = vf, so its proper time would be 


x? t 
s=e,/e?-[=/P?-Cr= <2 
ce "  -¥ (By) 


In summary, if you want to think that a lot of time has passed, then you should better 
look at your own static clock. Just hold it in your hand, and look at it. This way, it 
will tick fast, telling you that many seconds have passed. If, on the other hand, it 
moved towards or away from you, then it would tick more slowly, telling you that 
less seconds have passed. Later on, we’ll refer to this as time dilation. It leads to the 
twin paradox. 
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myself my twin 
in the lab @ clock 
v particle 


Fig. 4.4 The twin paradox: I live in the lab. My twin, on the other hand, lives inside a particle, 
getting away at speed v. I say: “my time ticks faster, so ’'m older!” My twin, on the other hand, 
sees things the other way around, and says: “my time ticks faster, so I’m older!” Who is right? 


4.3.2 The Twin Paradox 


Suppose that I live in the lab. With me, I have a static clock to show me my proper 
time: f. I also have a twin brother, who lives inside the particle, moving at the constant 
velocity v away from me (Fig. 4.4). With him, he carries his own clock that shows 
him his own proper time: s. 

As discussed above, I think that my own proper time goes faster, so I’m older. 
My twin, on the other hand, views things the other way around. He thinks that he is 
static and that I travel at velocity —v away from him. Therefore, he believes that his 
own proper time goes faster, and that he is older than me. Who is right? 

Here is the answer. Later on, we’ll also discuss yet another effect: length con- 
traction. From my own point of view, distances in the moving system are shorter. 
Therefore, I’m bigger than my twin: the veins in my body are longer, and the blood 
in my body has a longer distance to flow to my heart. This requires more time. For- 
tunately, thanks to time dilation, I indeed have more time. In summary, both twins 
have the same metabolism, and age in the same rate. 

Why did we have a paradox in the beginning? Because we looked at a single 
position: x = 0 (or x’ = 0). Through x’ = 0, a complete line passes: the entire 
t’-axis. Once transformed from system to system, this axis scales differently, leading 
to the twin paradox. To avoid this paradox, better transform a more substantial area, 
with some thickness in the x-dimension as well. After all, as a transformation from 
IR? to R*, the Lorentz transformation preserves area, as required. 


4.3.3 Hyperbolic Geometry: Minkowski Space 


In our original lab, let’s look at a fixed time t = f > 0. At fo, where could the 
particle be? Well, this depends on its velocity: to reach x, the velocity must have 
been v = x/fg. Let’s look at all those x’s that could have been reached by any 
particle, traveling at any possible speed v, not exceeding the speed of light: ||v < c. 
This makes a horizontal line segment in the x-t plane: 


{(x, fo) | |x| < cto}. 
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ct 


2 


c*t? — x” = const. = 82 


CSO 


x 


Fig. 4.5 A level set of s — a hyperbola in the original x-t lab coordinates. (x, t) is on the hyperbola 
if x could be reached at time ¢ by a particle moving at speed v = x/t with respect to the lab. In the 
self-system of the particle, on the other hand, this will happen at proper time so 


This is a level set of t. Indeed, in it, t is constant: t = fo. Still, now we know better: 
because it moves at its own speed v, the particle also has its own proper time s. Thus, 
we should actually look at a level set of s: the hyperbola 


{(x, t) | er-7 = cso} ; 


where Sp is constant. 

The motion of such a particle is modeled by the arrow in Fig. 4.5. Once the arrow 
hits the hyperbola, the particle arrives at x. The tiny clock embedded inside it will 
then show its proper time: so. 

Finally, let’s look at all possible so’s. Together, all these level sets make a new 
manifold: the two-dimensional x-s manifold. 


4.3.4 Length Contraction 


In its self system, the particle is always at x’ = 0. In the lab, on the other hand, it is 
at x. Where is this? To answer this, you must know where 0 is in the lab. From 0, 
measure x, and you arrive at the correct location. 

Thus, the location has no meaning on its own, but only relative to a reference 
point: 0. What is meaningful is the distance between two different locations. 
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Consider, for example, a stick that moves at velocity v with respect to our lab. In 
its self-system, the stick is at rest: one endpoint at x}, and the other at x. Thus, in 


its own system, its length is 


J, / / 
Ax’ =X, — Xj. 


What is the view from the lab? Well, let’s use Lorentz transformation: 


x 1 -£, x 
(2) =7@ (5, 1) (4) 
x, \ 1 —£, x 
(23) -7 (5, T) (a) 


Now, let’s subtract the former equation from the latter: 
Ax’ 1 —-£, Ax 
(car) = 7 (5, 1") (cae): 
To measure the length of the moving stick, a viewer who sits in the lab has no access 


to the self-system: he/she must use the x-t lab coordinates. For this purpose, he/she 
must have both endpoints x; and x2 at the same time f; = fo: 


Ax’ 1 -£, Ax 
oe) 
=706)(_ 5, )- 


In this equation, the top tells us that 


and 


Ax’ = Y (By) Ax, 
or 
_ Ax’ 
SY 


Ax 


Since y > 1, |Ax| < |Ax’|. This is called length contraction. From its own self- 
system, the stick looks longer than any other system (such as our lab). We’ve already 
used this effect to “solve” the twin paradox (Sect. 4.3.2). 

Moreover, the above length Ax (observed from the lab) decreases monotonically 
as |v| increases. In the extreme case of |v| = c and y = oo, for example, Ax = 0. 
This means that, from the lab, the stick travels so fast that it seems to shrink to one 
point. 

This confirms what was already said at the end of Sect.4.1.2: two particles that 
follow each other at the speed of light are indistinguishable—they could be consid- 
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ered as one and the same. This is also why, in a particle that travels at the speed of 
light, no change could ever be observed. 


4.3.5 Simultaneous Events 


In the above, in the lab, both endpoints are measured at the same time ft; = ft). These 
are indeed simultaneous events. Still, in the x-t plane, they are not identical. After 
all, they take place in two different locations: x; ~ x2. 

In the self-system, on the other hand, these events are no longer simultaneous. 
Indeed, in the equation in Sect. 4.3.4, look now at the bottom: 


cAt' = —¥ (By) ByAx = —B, Ax’ £0. 


Thus, the events are simultaneous in the lab only. In every other system, on the other 
hand, they are no longer simultaneous. 


4.3.6 Time Dilation 


Thus, in spacetime, two events could differ in time or location or both. So far, we 
discussed simultaneous events that happen at the same time. Next, let’s consider 
events that take place at the same location, but at different times. 

Consider again a tiny clock that moves at velocity v with respect to the lab 
(Fig.4.3). In this clock, two different times are measured: ¢5 > ft}. This is done 
at the same place x5 = x} in the self-system of the clock. 

Thus, in the self-system, the time difference is 


Peg ! 
At! =t,— th. 
This is indeed the proper time: the time in a clock that is at rest (Sect.4.3.1). 
A viewer who looks at the moving clock from the lab, on the other hand, measures 
the time difference 


At=t—-t 


Is this the same? Well, from the inverse Lorentz transformation, we have 


x 1 B, x! 
(2, )=70 (5,0) (24) 
x2 \ 1 B, x 
(3) = (4,4) (2). 


and 
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Let’s subtract the former equation from the latter: 
Ax \ _ 1 B, Ax’ 
Ge = 7 Bu) & 1 ) ee 
1 6, 0 
= 7 (Hv) (a ‘i ) io) 
vcAt’ 
= 6a) (Pt ). 


In this equation, the bottom tells us that 


At = 7 (Gy) aS 


Since y > 1, At > At’. This is time dilation: to read the shortest possible time from 
your clock, better read it from the self-system of the clock (where it is static), rather 
than from any lab that may travel toward or away from it. 

This is also the slowest time: it ticks more slowly, giving a smaller time difference: 
At’ < At. In the beginning, this had led to the twin paradox. Fortunately, together 
with length contraction, this makes perfect sense, and “solves” the twin paradox 
(Sect. 4.3.2). 

In the lab, the observed time difference At increases monotonically with |v|. In 
the extreme case of |v| = c and y = oo, At = oo as well. For this reason, in a 
particle that travels as fast as light, no change could ever be observed: every tiny 
change would seem to last forever. 


4.4 Velocity and Slope 


4.4.1 Doppler’s Effect 


In the above, the particle gets away from the lab at speed v. Inside the particle, there 
is a tiny clock. We assume that a viewer who sits in the lab could still read the time 
from this clock. This is still quite theoretical: how could this be done in practice? 
After all, this information must travel from the clock back to the lab, at a finite speed, 
not exceeding the speed of light! 

In its own self-system, the tiny clock shows time t’. Once read from the lab, on 
the other hand, this time transforms to t. For example, as things are observed from 
the earth, at time t; > 0, the particle gets as far as x; = vf). At this time, a signal as 
fast as light issues from the particle, to carry the news back to the lab. To arrive, it 
needs some more time: x;/c = vt,/c. (Here, we assume for simplicity that v > 0, 
as in Fig. 4.4.) Denote the arrival time by 7;. Later on, at time ft > t, the next signal 
will issue as well, to arrive at T> > Tj. 
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How to write the arrival time difference in terms of the real-time difference? In 
other words, how to write T, — T; in terms of the original time difference #5 — ft}, 
read in the self-system itself? Well, thanks to time dilation (Sect. 4.3.6) and the above 
discussion, 


AT=T-T, 
X2 X{ 
=nt+2-(n+—) 
Cc Cc 


vio vty 
=ht+—-(n+— 
c Cc 


=h-n+—(b-n) 
= (At) (1+ By) 
= (At')y (By) el + By) 
n At Bs 
—. A a 
OO TB 
: 1+ By 
= (A 
OO BTS 
pay [PARR 
= (Af) fo." 


Thus, since v > 0, a movie taken inside the particle would arrive to the lab in slow 
motion: an original activity that takes At’ seconds inside the particle would seem 
to take as many as ./(1 + (,)/C1 — Gy) At’ seconds upon being watched here in the 
lab. This is indeed Doppler’s effect. 


4.4.2 Slope: Moebius Transformation 


How do things look like from the second particle? Recall that this particle travels 
at speed —u with respect to the lab. To describe its self-system, let’s use now the 
x-t coordinates. After all, our convention is to use these coordinates in the system 
we’re interested in. This system is now not the lab but the self-system of the second 
particle. 

Let the lab system use now the x’-t’ coordinates. In these coordinates, how does 
the first particle move? Well, it moves at speed v, making a linear path or trajectory: 


or 
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or d , d F 
Xx Xx Vv 
= = Bo. 


d(ct') cdt’) c 


In other words, in the two-dimensional vector (x’, ct’)', the components have a 
constant ratio: G,. To transform this vector back to the x-t self-system of the second 
particle, apply an inverse Lorentz matrix to it: 


x’ Xx _ 1 Bu x! 
(1) > (a) =7@(a, 7) (4). 
Consider, for example, the vector (3,, 1)’. It lies in the above path: its components 
indeed have ratio G,. Let’s transform it as above: 


& aS 
Bu 1 1 1 = Bu By ; 
This is indeed how the first particle looks like from the x-t self-system of the second 
particle. In this system, to have the new slope, just divide the new top component by 
the new bottom component: 

dx _ Bu + By 

cdt - 1+ Bu By 


This is indeed the correct way to add velocities (Sect. 4.1.1). After all, this new slope 
is also the velocity of the first particle, as observed from the second particle. 

The Lorentz transformation transforms two-dimensional vector to two- 
dimensional vector. The Moebius transformation, on the other hand, transforms scalar 
to scalar— the original slope to the new slope: 


_ dx’ , a _ Bu + By 
~ cat!  cdt 14+ 8,8, 


By 


(Chap.5). Let’s use it to calculate the perpendicular velocity as well. 


4.4.3, Perpendicular Velocity 


Let us now consider a two-dimensional motion: not only in the x’ but also in the 
y’ spatial direction. For this purpose, our lab still uses primes in its coordinates: 
x’, y’, and t’. Assume now that the first particle moves at velocity (vy, vy) with 
respect to the lab: velocity v, in the positive x’ direction, and also velocity vy in 
the perpendicular y’ direction. (Note that these subscripts are not partial derivatives, 
but just coordinates.) The second particle, on the other hand, still moves at velocity 
(—u, 0) in the x’ direction only. This is how things look like from the lab (Fig. 4.6). 
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Fig. 4.6 View from the lab: y! e first particle 
the first particle travels at 
velocity (v,/, vy), while the 
second particle travels at 
velocity (—u, 0) v = (vg, Uy’) 


second particle e <— —_lab 
Ug! z 


U 


How do things look like from the second particle? Well, to describe the self-system 
of the second particle, let’s use the x-y-t coordinates. After all, our convention is to 
use these coordinates in the system we’re interested in. The lab, on the other hand, 
is less interesting: this is why it is described by the x’-y’-t’ coordinates. 

Now, in its own self-system, the second particle is at rest, while the entire lab 
moves at velocity (u, 0). In these terms, how does the first particle move? 

Fortunately, the Moebius transformation in Sect. 4.4.2 can now extend, and trans- 
form not only dx’/dt' but also dy’ /dt'. For this purpose, we must also use an extended 
3 x 3 Lorentz matrix, which leaves the second component unchanged: 


x x y (Bu) 1 By x! 
y}oly]= 1 1 y’ 
ct’ ct 7 (Bu) Bu 1 ct’ 


(As usual, blank spaces stand for zero matrix elements.) 

Still, we are mainly interested in slopes, or ratios between different components. 
After all, the slopes tell us in what direction the first particle gets farther and farther 
away from the second one (Fig. 4.7). So, the above three-dimensional vector could 
be multiplied by just any (nonzero) scalar, with no effect. (Compare with Chap. 6, 
Sect. 6.4.1.) In fact, we’re only interested in the differential form: 


dx 7 (Bu) 1 By dx’ 
dy |= 1 1 dy’ 
cdt 7 (By) By 1 cdt’ 
* (By) 1 Bu Buy 

= 1 1 Buy, 


Y (By) Bu 1 1 


We can now go ahead and divide by df, to obtain the new slopes (or velocities) dx /dt 
and dy/dt, as observed from the second particle. (This way, we actually eliminate 
the x’-y’-t’ lab coordinates, and drop them.) As observed from the second particle, 
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y e first particle 


(dx /dt, dy/dt) 
dy/dt + 


second particle e 
dx /dt 


Fig.4.7 View from the second particle: the first particle gets away at anew velocity: (dx/dt, dy/dt) 
in the x-y-t system 
the x-velocity of the first particle is still 


dx By + By, Ur Vy! 
=..C 7 — is 
dt 1+ Bub 1+ Buby 


as in Sect.4.4.2. The y-velocity, on the other hand, is 


dy Bv, Vy’ 


dt “> (Bu) (1+ BuBo,) 7 Bu) (1+ BuBo,) 


Thus, as observed from the second particle, the first particle indeed makes the path 
in Fig.4.7. To draw it, we now have the new slopes dx/dt and dy/dt in terms of 
three known parameters: u, v,, and vy. 


4.5 Momentum and Energy 


4.5.1 Conservation of Momentum 


Consider a particle of mass m, moving rightwards at velocity u with respect to the 
lab. Suddenly, the particle explodes, and splits into two new subparticles of mass 
m/2 each. Thanks to symmetry, with respect to the original particle, one subparticle 
flies rightwards at the extra velocity of v, while the other flies leftwards at the extra 
velocity of v (Fig. 4.8). 

In Newtonian mechanics, the momentum is defined as the mass times the velocity 
(Chap. 2, Sect.2.4.1). This is the linear momentum in the x-spatial dimension. In 
these terms, the momentum indeed remains the same: 
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Fig. 4.8 Conservation of before the explosion: 
momentum: after the original 

particle (top picture) 

explodes and splits into two e —___» 
subparticles (bottom 
picture), the total momentum 


is still muy(G,), as before miner the xplonien: 


am mm 
2 2 
+ @ eo —_~> 
Uu—vU utv 
1—BuBo 1+BuBy 


m 


S Ge 
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m 
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Thanks to special relativity, however, we already know that this is not the correct 
way to add velocities (Sect.4.1.1). To fix this, let’s redefine momentum in a more 
accurate way: 


muy (Bu) 


rather than just mu. This is indeed a relative definition: it defines the momentum of 
the original particle not absolutely but only relative to the lab. 

This new definition is indeed a natural extension of the old one. After all, for a 
small velocity u < c, little has changed: since 3, « 1, we also have 7(G,,) ~ 1, so 


muy(Gy) ~ mu. 


For a large velocity u, on the other hand, the new definition is an important improve- 
ment. Indeed, to make the explosion happen, some energy is required, which must 
come from somewhere. Now, our system is isolated: no force or energy could come 
from the outside. Therefore, the energy for the explosion must come from mass: the 
subparticles must lose some of their original mass. More precisely, each subparticle 


has mass 
m m m 


< = 
27(B.) 270) 2 


(Sect.4.5.3 below). What is the physical meaning of this inequality? Well, it says 
that the mass after the explosion must be less than before the explosion, when the 
subparticle was still inside the particle, and had velocity v = 0 with respect to it. 
Still, no mass was lost for nothing: it supplied the extra energy required to make the 
explosion happen. 

Let’s assemble the so-called momentum matrix: mass times Lorentz matrix, 
summed over both subparticles. (Later on, we’ll focus on just one element in it, 
to obtain the desired momentum.) Thanks to the composition in Sect. 4.2.4, 
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mass - Lorentz matrix 


= m (ASE) (14 ae) m (A52) (14 Bu — By 1) 
27 (By) 1+ By By "1+ Buby ) " 27(Bv) \1= Bubo " 1 Buby 


vy) L yd By) T+ By J v) I = Byt uw) 1 J 

Has" ¥ (Pv) T+ Bot) ¥ (Bu) T+ Bu ee TT Hy ByJ)¥ (Bu) E+ Bu) 
T+ By J + I Buy J) ¥ (Bu) Ud + BuJ) 

= my (Bu) T+ Bud). 


Thus, after the explosion, the momentum matrix remains the same (with respect to 
the lab). This is indeed conservation of momentum in matrix sense. 

The above is a matrix equation: it actually contains four scalar equations. Let’s 
look at just one of them, say the upper right one: 


m 7 (Fe) ee m 7 (Fo) ae = my (Bu) B, 
27 (Bu) 1+ By Bo) 1+ Buby 27 (By) 1— By By) 1 BuBy roe 
To simplify, let’s multiply this by c: 
m u+v (44) m u—v (755) =m (By) 
27 (By) 1 ae BuBy 1 ca Bu By 27 (By) 1— Buby’ 1— Bu By 7 ap aes 


This is indeed conservation of momentum: after the explosion, relative to the lab, 
mass times the velocity times the relevant 7 still sums to the same: muy(G,,). 


4.5.2 Relative Energy 


So far, we’ve defined the relative momentum, and made sure that it is indeed con- 
served. What about relative energy? To help define it too, we’ll need to differentiate 


V(Bv): 


1 (Bo) = (1-8)? 
= -5 (1 — 2)*” (-28,) 
= 8, (1- RB)”. 


Let’s use this to differentiate y(3,) as a composite function of v: 


<P 


d 
ae (By) ¥ Bo) 


aa (By) 
(7 
1 : 


3/2 
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Let’s use this to differentiate the product vy(3,): 


d d 
ae (vy (Bv)) = (Gv) + re, (By) 
VU 
= 7 (Gv) + By (1 = 2)” 


= 7(G) + 2 (1- BY)” 

= 7? (By) 7° (By) + B? (1 — 6?) 

= (1-6) (1- @) 7" +B (1-2) 
= (I — By) aye 


We are now ready to define relative energy. This definition will be accurate not only 
for small but also for large velocities. 

Consider a particle of mass m that is initially at rest in the x-t coordinates in 
the lab. Then, an external force F is applied to it from time 0 until time g > 0, 
to increase both its momentum and energy. Thanks to the increase in its energy, 
the particle doesn’t have to lose any mass: it may remain with the same mass m 
throughout the entire time interval [0, q]. 

To have the force, we need to differentiate the relative momentum in Sect. 4.5.1 
with respect to time. This will help define relative energy: 


3/2 


—3/2 


x(q) q 
i Foods = [ F(x(t))dx(t) 

x(0) 0 
: [ Fey) Oar 

0 


= - F(x(t))v(t)dt 
0 


4d 
=m fos ay UY Fo) ve@Odt 
=m fos ay (yo ~v(t)dt 


v(q) 
= mf = ae (vy (By)) vdv 
v(0) U 


= me? (7 (Bq) — 7 (S.@)) 
= me’ (7 (Byq) — 1). 
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This is indeed the new kinetic energy that the external force has introduced into the 
particle from time 0 until time q. 

The potential (nuclear) energy stored in the particle at rest, on the other hand, is 
not relative, but absolute. This is the amount subtracted in the above formula: 


— ae 2 
Enotential = E(0O) =mc’. 
With this original energy, the total energy is available as a smooth function of v: 
E(v) = Epotential + -kinetic(Y) = me* + mc? (y (By) — 1) = me?y (By). 
This new definition is closely related to the momentum matrix 


my (By) T+ BJ). 


Indeed, just look at the lower right corner, and multiply by c?. 

The new definition improves on the old one: it is accurate not only for small but 
also for large velocities. In fact, for a small velocity v < c, it is nearly the same as 
the well-known (inaccurate) formula. Indeed, from the Taylor expansion around 0, 
we have that, for & <1, 


= 1 
(bo) = (1- BY? ~ 1+ 58. 


Therefore, for v < c, the new definition nearly agrees with the classical one: 
2 2 1 2 2 Mm 9 
E(v) = mc*y (By) ~ mc 1+ 58 =mc sae 


If, however, no external force has been applied to it, could the particle still have a 
nonzero velocity v #4 0? Well, it could, but, in this case, its kinetic energy must have 
come from somewhere: from its original potential energy. For this, there is a price 
to pay: the particle must have lost some mass to start moving. Only if it gave up 
motion altogether and remained at a complete rest (v = 0) could it keep its original 
(maximal) mass. 


4.5.3 Energy Is Conserved—Mass Is Not 


In Sect.4.5.2, we’ve assumed that an external force is applied to the particle, to 
increase both its momentum and energy. This is why its mass remains m at all times. 

In Sect. 4.5.1, on the other hand, the explosion takes place in a closed (isolated) 
system: no external force is applied. For this reason, not only the total momentum 
but also the total energy remains unchanged. 
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During the explosion, where do the subparticles get their extra kinetic energy 
from? Well, it must come from the original potential (nuclear) energy, stored in the 
original particle. As a price, during the explosion, mass must decrease. 

As a matter of fact, this is true not only for an exploding particle but also for any 
particle that starts moving at velocity v, while preserving its total energy. Its new 
kinetic energy must come from somewhere: from its potential nuclear energy. As a 
price, the original mass m that the particle had at rest must decrease from m to 


m 


7 (Bv) 


This way, its total energy remains the same as at rest: 


m(v) = <m. 


m 


y (Bv) 


This is why the absolute quantity m is also called the rest mass: the maximal possible 
mass. At motion, on the other hand, the new mass m(v) gets smaller: 


E(v) = m(v)c*y (By) = cy (8) = mc” = E(0). 


m 


m(v) = = <m >= — 
~ ¥ (Bo) ~ (0) 


The above happens in an isolated system only: no external force is welcome, so the 
total energy remains constant, while the mass m(v) decreases as |v| increases. This 
means that energy is never lost, but can only convert from potential to kinetic energy. 
In this process, the total energy remains the same: 


PoE tS Be a) = — ek oS Se 
= “potential kinetic \U) = 7 (By) 7 (Bu) Y Wo = . 


This is quite different from the situation considered in Sect.4.5.2, in which the 
system is not isolated, and welcomes an external force from the outside. In that case, 
the kinetic energy increases with |v|, while the mass remains constant. This was 
necessary to help define the kinetic energy obtained from the work that the external 
force does. 

Mass, on the other hand, may change even in a closed system. In fact, as |v| 
increases, the mass m(v) decreases. In the extreme case of |v| = c, for instance, we 
have y = ov, so the particle has no mass at all: all its potential energy has already 
been exploited, and converted into kinetic energy. 


4.5.4 Lorentz Transformation on Momentum—Energy 


In their new definitions in Sects. 4.5.1-4.5.2, both energy and momentum are relative: 
they depend on the velocity v, which may change from system to system. Consider, 
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for instance, a particle of mass m, moving rightwards at velocity v with respect to the 
lab. To describe the lab, use again the x’-t’ coordinates (as in Sect. 4.4.2). This prime 
means no differentiation—it just reminds us that both x’ and t’ are measured in the 
lab. Thus, in the lab, the momentum of the particle is denoted with a prime as well: 


p’ =mvy (Sy). 


Likewise, in the lab, the total energy of the particle is denoted with a prime as well: 
E' = mc’y (}y). 


Why are we using primes here? Because we are not really interested in the lab. We are 
more interested in yet another particle, moving leftwards at velocity u with respect 
to the lab. To describe its own self-system, use the standard x-t coordinates, with no 
prime (as in Sect.4.4.2). After all, our convention is to use no prime in the system 
we’re interested in. So, in the self-system of the second particle, what are the energy 
E and the momentum p of the first particle? 

Of course, we could take a naive approach: use the rule of adding velocities to 
calculate the velocity of the first particle away from the second one. Then, use it to 
calculate the relative momentum and energy as well. Still, this would require a lot of 
calculations. Is there a more direct way? 

Fortunately, there is. For this purpose, observe that p’ and E’/c have a familiar 
ratio: : 

pi _ mvy (Gy) _ v 


Bie merge 


So, let’s put them in one column vector, proportional to the column ((,, 1)’ in 
Sect. 4.4.2. More precisely, this is just the second column in the momentum matrix, 


multiplied by c: 
ee = cmy (8) I+ Bed) (1) . 


From the perspective of the second particle, on the other hand, the entire lab moves 
rightwards at velocity u. On top of that, the first particle also travels in the lab at 
velocity v. Thus, the momentum matrix is now 


my (8.) U + But) y (Bv) (+ fByJ). 


To have the energy and momentum of the first particle from the perspective of the 
second one, just look at the second column of this new momentum matrix, and 
multiply by c: 


(ee) =cmy (Bu) Ud + Bu J) ¥ (By) (+ ByJ) G) 


= 7 (Bu) E+ BuJ) ca: 
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So, to drop the primes and have both p and E for free, just apply the inverse Lorentz 
transformation in Sect. 4.2.5. This way, you work with energy and momentum only, 
avoiding the explicit transformation of the entire lab back to the self-system of the 
second particle. As a result, both the energy and the momentum of the first particle 
are now available not only with respect to the lab but also with respect to the second 
particle, with no need to add the velocities u and v explicitly any more. 


4.6 Energy and Mass 


4.6.1 Absolute Nuclear Energy 


In the above, we put the energy and the momentum in the second column of the 
momentum matrix. Let’s go ahead and do this in the first column as well: 


(208 Fi.) = em Bd C+ Bd). 


Fortunately, a Lorentz matrix must have determinant |: 


E” 12 E'/c p’ 
ae Meas (( P’ ye) 

= det (cmy (By) Ud ae ByJ)) 
c’m* det (7 (By) U + BvJ)) 


= mc’. 


As in Sect.4.5.4, to drop the primes, we have to apply yet another Lorentz matrix. 
This has no effect on the determinant: 


E* E/c p 
ahaa (7 Ze) 


vend + 80 (AM 2.)) 


( : 
= E'/c p! 
= oer (( p’ i) 


=mec. 


= det 


Thus, the determinant is invariant and absolute: it doesn’t depend on u, and doesn’t 
change from system to system. To simplify, multiply by c?: 


‘ cp? =m’'c'. 


130 4 Special Relativity: Algebraic Point of View 


This is indeed the squared nuclear energy stored in the particle at rest (Sects. 4.5.2— 
4.5.3). This energy is not relative, but absolute: it is completely independent of the 
velocity, or the system used to measure it. 

In general, the momentum p might be a three-dimensional vector rather than just 
a scalar. In this case, p” should be replaced by the inner product || p||? = (p, p). 


4.6.2. Invariant Rest Mass 


What is the physical meaning of this formula? It tells us that the rest mass m is 
invariant: it remains the same in all systems, and is never changed under any Lorentz 
transformation. For this reason, m could be calculated not only in the original lab 
but also in any other system. 

The above discussion is thus most practical. Thanks to it, m could be calculated 
not only from p’ and E’ (the momentum and energy in the lab) but also from p 
and EF (the momentum and energy in the self-system of the second particle). This 
observation will be most useful below. 

In particular, why not calculate m in the self-system of the first particle itself? 
After all, in this system, there is no velocity or momentum or kinetic energy at all, 
so the above formula simplifies to read 
24 


2 = 
F potential ee 9 


or 


by Snes 5D 
Enotential = ™C"» 


Einstein’s famous formula. 


4.7 Center of Mass 


4.7.1 Collection of Subparticles 


In the lab, if the velocity v of the first particle is unknown, then it could still be 
obtained in terms of the momentum p’ and the energy E’: 


y 


_ mvy (By) Pa2 
~ me2y(By) EE 


When is this useful? When the momentum and energy are available, but the velocity 
is not. This is quite practical: p’ and E’ are more fundamental than v, which is often 
missing. 
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Throughout this chapter, the second particle could be just theoretical, and have 
no size or mass at all. After all, it only serves as a reference point for the first particle 
and its motion. 

The first particle, on the other hand, is more real and physical. To emphasize this, 
let’s replace it by acollection of k > 1 subparticles, each with velocity v;, momentum 
p;, and energy E; with respect to the lab (1 <i <k). 

What are the total momentum and energy? Well, as fundamental (and conserved) 
quantities, they sum up: 


k k 
p= yo and E’ = > E}. 
i=l i=l 


The velocity of the entire collection, on the other hand, is not necessarily the sum of 
the v;’s. After all, the subparticles may have different masses, which are not always 
available. As a matter of fact, some of them may even have no mass at all (those that 
are as fast as light, and have |v;| = c). To define the total velocity properly, better 
use the fundamental relative quantities: momentum and energy. 


4.7.2 Center of Mass 
We are now ready to define the velocity of the entire collection: 


k ! 
2 2 ae Pi 


7 : 
dia Fi 


This new velocity describes the motion of no concrete physical object, but only a 
theoretical object: the center of mass of the entire collection. Where is this? To tell 
this, let’s use the second particle. 

The above velocity is in terms of the lab. Next, let’s look at things from the second 
particle, which travels in the lab at velocity —u. In its self-system, the momentum 
and energy of the collection are 


p\_ p 
i = 7 (Bu) E+ Bud) Es 
(Sect. 4.5.4). 


In this equation, look at the top. Assume also that the second particle follows the 
center of mass at the same speed: 


c 
lil 


p’ 
B® 


/ 


_ ?P 
E'/c 


u=—v, SO By = —B, = 
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In this case, in the previous equation, the top simplifies to read 
p=0. 


Thus, with respect to the second particle, the entire collection has no momentum at 
all: it is at a complete rest. This is why the second particle marks the center of mass 
itself. 


4.7.3 Rest Mass of the Collection 


What is the rest mass m of the entire collection? Again, we can’t just sum the 
individual masses of the subparticles. Instead, we better work with more fundamental 
quantities: momentum and energy. 

In the self-system of the second particle, the collection has no momentum at all. 
Therefore, the formula in Sect. 4.6.1 tells us that 


or 


This is indeed a proper definition of the total mass of the entire collection. 

What is the physical meaning of this? Well, in its own self-system (or the self- 
system of the center of mass), the collection has no momentum at all: p = 0. Thus, 
the above actually defines its rest mass m in terms of its total energy E’. Fortunately, 
mass is invariant, so m is the same in the original lab as well (Sect.4.6.2). Still, in 
the lab, the collection moves, so its true mass is no longer m but only m/7y(3,) < m 
(Sect. 4.5.3). 


4.8 Force 


4.8.1 Passive System—Strong Perpendicular Force 


As in Sect. 4.4.3, assume now that the lab is described by the x’-y’-t' coordinates. In 
the lab, the entire collection moves obliquely, at the new velocity v = (vx, vy)! 1 Uy! 
in the positive x’-direction, and v, in the perpendicular y’-direction. In this case, the 
momentum is oblique as well: p’ = (p.,, Py)! , proportional to v. This is the view 
from the lab (Fig.4.9). (From the second particle, on the other hand, things look 
different—Fig. 4.10.) 
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Fig. 4.9 View from the lab: y! 
initially, at time r’ = 0, the 
collection is still at rest at 
(x', y’) = (0,0). Att’ = 0, 
an oblique external force 
F' = (F\,, F/,) starts to act 
upon it, to increase its 
momentum and kinetic 
energy, while not changing 
its mass 


second particle e «— «+ « | 
. x 
Fi, 


uU 
collection 


Fig. 4.10 View from the ¥ 
second particle: the force 

that acts on the collection 

remains the same in the 

x-direction, but seems 

weaker in the perpendicular 

y-direction Fy + 


F = (Fr, Fy) 


second particle e —+ + + | 


U 
collection 


Note that there is no differentiation here: the prime means no derivative, but only 
reminds us that we are in the lab system. Also, the subscripts ,, and ,, mean no partial 
derivative, but only spatial coordinates. 

Thus, in the formula in Sect.4.6.1, p’” should be replaced by the inner product 
Ip’ |? = (p’, p’). After all, in theory, we could always redefine x to align with v and 
p’ (see exercises below). Fortunately, there is no need to do this explicitly. 

The second particle, on the other hand, still moves in the x’-direction only: at 
velocity (—u, 0) 4 (0,0) with respect to the lab. To transform from the lab to the 
self-system of the second particle, we must now use an extended 3 x 3 Lorentz 


Px Y (Bu) 1 Bu Py 
Py |= 1 1 Py 
E/c Y (By) Bu 1 E'/c 
Here, we no longer assume that u = —v. Thus, the second particle no longer coincides 


with the center of mass. Instead, this job is left to the lab itself. 
Indeed, assume now that the lab system is initially the same as the self-system of 
the collection: at time t’ = 0, the collection is at rest in the lab: 
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p= (2 )=(5). E'=mc? >0, and ve ()=fe=(9). 


This is true at time t’ = 0 only. At t’ > 0, on the other hand, things may change, due 
to an external force. 

Unlike before, assume now that the lab is no longer closed or isolated. On the 
contrary: from time tf = t’ = O onward, an external force is applied to the entire 
collection, to increase its momentum and (kinetic) energy in the lab, while preserving 
mass. This is why the lab is called here the passive system: after all, the original force 
is applied directly to the collection that was initially at rest in it. 

In the passive system, this force could be measured, and is often available: the 
derivative of the momentum in the original x’-y’-t’ coordinates: 


dp’, 


j ; 
F’ _ Fy _ dt 
=|, )= 
y’ dp’, 


dt’ 


Let’s focus on the force at the initial time t’ = 0: 


dp’, 


a 70) 
F= (7: ) = 


dp’, 
= (0) 


(After all, every time t’ > 0 could in theory be shifted back to zero.) How does this 
force look like from the second particle? In other words, how to transform the force 
to the x-y-t coordinates in the self-system of the second particle? 

Of course, we could take a naive approach: transform the momentum and energy to 
the self-system of the second particle, and define the force F there by differentiating 
the momentum with respect to ¢. Still, this could require a lot of calculations. Is there 
a more direct way? 

Fortunately, there is. To differentiate the momentum with respect to time, let’s 
use the trick in Sect. 4.4.3. 

To start, let’s differentiate t with respect to t’. This seems easy: after all, in the 
lab, t' is the proper time, isn’t it? So, as in Sect. 4.3.1, it should satisfy 


; t 


= 7B) 


shouldn’t it? Furthermore, as in Sect.4.3.6, it should satisfy time dilation: 


_ At 
~ (Bu)? 


/ 


At 


shouldn’t it? 
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Unfortunately not. After all, t’ might be proper only in an isolated lab, which 
welcomes no external force. In our lab, on the other hand, t’ is only nearly proper: 
only at t’ = 0, before the force had time to act, does t’ behave like a proper time. 

To see this, let’s use the inverse Lorentz transformation: 


ct = 7 (Gu) (Bux' + ct’), 


or 


Let’s differentiate this with respect to t’: 


dt - Bu dx’ _ Bu 
dt’ = Y (Bu) (4 : dt! =e i) = Y (Bu) (Sv + 1) : 


At t’ = 0, in particular, v,, = 0, so this simplifies to read 


d 
Saya). 


dt 
Thus, at t’ = 0, t’ is indeed nearly proper: it behaves just like a proper time. Let’s 
use this to look at the force from the second particle as well (Fig. 4.10). 
Let’s start with the perpendicular component: F\,. From the above 3 x 3 matrix, 
in the y-direction, the momentum is still the same: 


Py = Py. 
Thus, the differentiation is simple: 


dt 
d Py 
dt 
apy 
7 (Bu) at’ 
1 dp, 
Y(Bu) dt’ 
— 1 F' 
Y (By) ” 
Thus, in the perpendicular y-direction, the passive system feels the maximal force. 
From any other system, on the other hand, the force feels weaker. In particular, the 
self-system of the second particle feels a force that is y(@,,) times as weak. 


Fy= 
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What about the force in the x-direction? Does it also feel weaker? Well, let’s use 
the same trick: 


dpx 
ae 
d (7 (Gu) (P+ B=) 
~ 7 (By) at! 
d(p. + 6.) 
— 
dp', . dE’ 
= 7 + Bua 
, By dE’ 
= Fi,+ ee 


To simplify this, let’s look at the latter term, and show that it contributes nothing. 
For this purpose, let’s look at the original equation 


E? = eC (v’, p’) +4 m2c4. 


This equation comes from the original definition of E’ and p’ in the lab. Therefore, 
it holds for every time t’ > 0, although possibly with a different E’, p’, and v. Later 
on, we’ll focus on the initial time t’ = 0 once again. 

In this equation, the latter term remains constant. After all, thanks to the external 
force, the potential energy (and the mass) remain unchanged. So, once differentiating 
both sides with respect to ¢’, the latter term drops: 


dE’ dp’ 
! = 2¢2 Nt* — 92 !\t Be! 
Fi OE) Fa c(pyF, 


2E 


where (p’)‘ is the row vector that contains the momentum in the lab. 
Recall again that we’re particularly interested in the initial time of tr’ = 0. At 
t' = 0, the momentum is still zero: 


/ 


E 
2E' see = 2c*(0,0)F’ = 0. 


Since E’ > 0, we must therefore have 


dE’ 
dt! 


In summary, at t = t’ = 0, 
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dE’ 
Fea rt. dt’ = Fy. 


Thus, unlike F,, F, remains the same at all systems. 


4.8.2. Photon: A New Universe? 


A light ray may be viewed in two different ways. On one hand, it is a wave. On the 
other hand, it is also a particle: a photon, traveling at speed c with respect to us. 

From our perspective, a particle as fast as light may have no mass at all (end 
of Sect.4.5.3). Furthermore, due to length contraction, it may have no size either 
(Sect. 4.3.4). Still, this is only from our own (subjective) point of view. The photon 
may disagree: in its own self-system, it is at rest, so it may well have both mass 
and size. In fact, in its own self-system, the photon may even contain a whole new 
universe, with many other mankinds in it! Only because the photon is so fast don’t 
we get to see this interesting universe! 

On the contrary: from the photon’s perspective, the entire universe travels at the 
speed of light in the opposite direction. Thanks to length contraction, in the photon’s 
eyes, the entire universe is as small as a single point, with no size or mass at all. So, 
we don’t even exist! 

Yet worse, the photon is not the only one who says so. In fact, all photon in the 
entire universe agree on just one thing: the universe doesn’t exist at all! So many 
witnesses can’t be wrong, can they? 


4.9 Exercises 


4.9.1 Motion in Three Dimensions 


1. Show that the determinant of a 2 x 2 matrix is the same as the area of the 
parallelogram made by its column vectors. Hint: see Chap. 2, Sect.2.3.3. 

2. Show that the determinant of a 2 x 2 matrix is the same as the area of the 
parallelogram made by its row vectors. Hint: see Chap. 2, Sect.2.1.3. 

3. Show that the Lorentz matrix has determinant 1. 

4. Consider a2 x 2 matrix. Multiply it by a Lorentz matrix. Show that the original 
determinant hasn’t changed. Hint: see Chap. 2, Sect. 2.1.3. 

5. Conclude that the Lorentz transformation preserves area in the two-dimensional 
Cartesian plane. 

6. Conclude also that the inverse Lorentz matrix has determinant | as well. 

7. Conclude that the inverse transformation preserves area as well. 
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10. 


11. 


12. 
13. 


14. 
15. 
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. Use Cramer’s formula (Chap.2, Sect.2.1.4) to calculate the inverse Lorentz 


matrix directly. 


. Does this agree with the trick in Sect. 4.2.5: to have the inverse, just change the 


sign of the velocity—replace v by —v? 

What is the physical meaning of this? Hint: from the moving system, the lab 
seems to move in the opposite direction. 

Let 


v= U2 eR 


be some nonzero three-dimensional real vector. Define the 3 x 3 matrix O,, 
whose columns are v (normalized), a vector that is orthogonal to v (normalized 
as well), and their vector product: 


v vt vx ut 
Ov= L L . 
ull | ord | dell orl 


Show that O, is an orthogonal matrix. Hint: see Chap. 2, Sects. 2.2.4—2.3.2. 
Conclude that O, has determinant 1. 

Consider a particle that moves at velocity v € R? with respect to the lab. In other 
words, the particle moves at direction v/||v|| at speed ||u||. Let (x’, y’, z’, ct’) 
denote the lab coordinates, and (x”, y”, z”, ct’) the self coordinates of the par- 


ticle. We are now ready to define the more general Lorentz transformation 


Xx xX 
’ ” ! 
y y y 
! > ” =L, ! , 
Zz Zz z 
ct’ ct” ct’ 


where L, is the following 4 x 4 Lorentz matrix: 


¥ (Biot) 1 — By 
4G 1 1 o! 
=(%) 1 4(%): 
Y (Bion) 7 \ Biot 1 


(As usual, blank spaces stand for zero matrix elements.) Show that this indeed 
transforms the lab system to the self-system of the particle. 

Show that L, has determinant 1. 

Consider also a second particle, moving at velocity —u € R* with respect to 
the lab. Denote its self coordinates by (x, y, z, ct). With respect to this system, 
the entire lab moves at velocity u € R?. Show that the transformation from this 
system to the lab system is 
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16. 


17. 
18. 
19. 


20. 


21. 


22. 
23. 


x x 
- 
y y 
> a = bh, 
vé j & 
ct ct’ ct 


x 
Ww 
y y 
=. " 
z 
ct ct” 


from the self-system of the second particle to the self-system of the first particle. 
Show that it is represented by the matrix product L,L,: 


” / 


XxX x 
” ’ 
y y 
” =L, ’ =L,L, 
g Z v4 
ct” ct’ ct 


Show that L,L,, has determinant | as well. Hint: see Chap.2, Sect. 2.1.3. 
Does L,, commute with L,,? Hint: only if uv is a scalar multiple of v. 
Consider the inverse Lorentz transformation 


” 


< 


z 
ct” ct 


from the self-system of the first particle back to the self-system of the second 
particle. Show that it is represented by the inverse matrix 


Cali = | Pras Sire = E45 Bye 


Conclude that the last column in L_,, L_, describes the motion of the first particle 
away from the second one. Hint: see the exercises below. 
Show that, in its self-system, the first particle is at rest: 


Conclude that ¢” is a proper time. Hint: see Sect. 4.3.1. 
Conclude also that 

x 

y 

z 
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24. 


25. 


26. 


27. 


28. 


29. 
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Show that it is sufficient to solve this up to a scalar multiple. Hint: we’re only 
interested in the slopes (or ratios) x/t, y/t, and z/t. 
Conclude that it is sufficient to solve 


0) 
0 
Ly yLyw = 0 |> 
1 
where 

x 
w= y 
Zz 
ct 


(up to a scalar multiple). 
Simplify this even more to read 


Lyw = L_y 


- OC O 
ll 
——~ 
ae 
a 4 


(up to a scalar multiple). 
Simplify this even more, to read 


c 
(up to a scalar multiple). 
Consider a special case, in which u aligns with the x-axis: 


|u| 


= 
lil 
° 


Show that, in this case, one could design 


= 
ll 
—~ 


(the 3 x 3 identity matrix). 
Show that, in this case, 


4.9 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


Shs 
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¥ (Brut) 7 \ Btu 1 


¥ (Ghat) 1 Aju 
1 1 


= = ‘ ; 
Y (Bju) Butt 1 


Show that this is not just a special case, but a most general case. Hint: for a 
general u, pick the x-axis to align with uw in the first place. 

Use w (as defined above) to uncover the slopes (or the velocity) along which the 
first particle travels away from the second particle: 


dx/dt (m1 
dy/dt |= — | w, 
dz/dt W4 \ Ww; 


Interpret L, as a projective mapping in the real projective space (Chap. 6, 
Sects. 6.7.3 and 6.9.1). 

Interpret the above method as the three-dimensional extension of the methods 
in Sects. 4.4.2-4.4.3. 

Likewise, in Sect.4.4.3, interpret the inverse Lorentz transformation back to 
the x-y-t self-system of the second particle as a projective mapping in the real 
projective plane (Chap. 6, Sects. 6.4.1 and 6.7.3). 

What does this mapping do? Hint: it maps the original velocity (dx'/dt’, dy’ /dt’) 
of the first particle in the lab (Fig. 4.6) to the new velocity (dx/dt, dy/dt) of the 
first particle away from the second one (Fig. 4.7). 

In Figs. 4.6 and 4.7, where is the time variable? Why is it missing? Hint: these 
figures are static, not dynamic: they tell us the position, not the time. Time is just 
a parameter, telling us how a particle moves in the direction pointed at by the 
arrow. Still, we can’t see this: only in a movie could we see this dynamics—not 
in a static picture. How did we get rid of time? We just divided by ¢ (or r’). This 
way, the time variable was eliminated. Now, it is only used implicitly to push 
the particle along the arrow—the velocity vector. 


Part II 
Introduction to Group Theory 


What have we done so far? Well, the vectors introduced above make a linear space. 
Indeed, the algebraic operations between them are linear. The (nonsingular) matrices, 
on the other hand, makes a new mathematical structure: a group. 

In a group, although the commutative law not necessarily holds, the associative 
law does hold. In what follows, we introduce group theory, including the first, second, 
and third isomorphism theorems, and their geometrical applications. 

Matrices are particularly useful to represent all sorts of practical transformations 
in geometry and physics. In special relativity, for example, Lorentz transformations 
are written as 2 x 2 matrices. Here, we’ll put this in a much wider context: group 
representation. To show how useful this is, we'll represent projective mappings as 
3 x 3 matrices. This is particularly useful in computer graphics. Finally, we’ll also 
use matrices to introduce yet another important field: quantum mechanics. 


Chapter 5 ®) 
Group Representation and Isomorphism speek 
Theorems 


What is the most elementary algebraic object? This could be the individual number. In 
the previous part, we also introduced more complicated algebraic structures: vectors 
and matrices. 

Furthermore, elementary algebraic objects like numbers, once used as input and 
output, form a yet more advanced mathematical object: a function. The polynomial, 
for example, is just a special kind of function, enjoying many algebraic operations: 
addition, multiplication, and composition. 

Functions are indeed studied in a few major mathematical fields. In set theory, a 
set of functions is often studied just like any other set, and its cardinality is estimated. 
In algebra, on the other hand, functions are also viewed as algebraic objects that can 
be composed with each other. Finally, in calculus, functions are also considered as 
analytic objects that can be differentiated and integrated. 

In this chapter, we consider a special kind of function: a mapping or transforma- 
tion. Together, the transformations form a new mathematical structure: a group, with 
a lot of interesting properties. 

To help study a mapping, we mirror it by a matrix. This way, algebraic operations 
are mirrored as well: composition of two mappings is mirrored by multiplication of 
two matrices. This is indeed group representation. 

This point of view is most useful in the practical implementation. After all, a 
mapping could hardly be stored on the computer. A matrix, on the other hand, can. 
Furthermore, the representation could help understand the deep nature of the original 
mapping as an algebraic and geometrical object. 

We have already seen an example of a useful transformation: in special relativity 
(Chap. 4), the Lorentz transformation has been represented as a 2 x 2 matrix. Here, 
on the other hand, we put this in a much wider context: group theory. In particular, 
we prove the first, second, and third isomorphism theorems, used later in projective 
geometry. 
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5.1 Moebius Transformation and Matrix 


5.1.1 Riemann Sphere—Extended Complex Plane 


To make the discussion more concrete, we need a new geometrical concept: the 
infinity “point.” As a matter of fact, this is not really a point. Still, it can be added to 
the complex plane, to form a complete “sphere.” 

The extended complex plane (or the Riemann sphere) 


CU {oo} 


is obtained from the original complex plane C by adding one more object: the infinity 
point oo. This new “point” is not really a point, but a new artificial object, to help 
model a complex number with an arbitrarily large absolute value. 

The infinity point is unique. In fact, in the complex plane, one could draw a ray 
issuing from the origin in just any angle. The complex number z could then “slide” 
along this ray, and approach the same point: infinity. Later on, we’ll also meet more 
complicated spaces, with many infinity points. 


5.1.2. Moebius Transformation and the Infinity Point 


Thanks to the infinity point, we can now define the Moebius transformation. 
We’ve already met this transformation in the context of special relativity (Chap. 4, 
Sect. 4.4.2). Here, however, we introduce it in much more detail and depth, and in a 
much wider context [4, 59]. 

A Moebius transformation (or mapping) from the extended complex plane onto 
itself is defined by 

az+b 
=_ ’ 

cz+d 


where a, b, c, and d are some fixed complex parameters. Here, ‘—’ stands for 
transformation, not for a limit. 

Unfortunately, the definition is still incomplete. After all, the mapping is not yet 
defined at the infinity point z = oo. Let us complete this gap in such a way that the 
mapping remains continuous: 


This way, z could approach infinity on just any ray in the complex plane. After all, 
in every direction, the transformed values converge to a/c, as required. 

Still, the definition is not yet complete. What happens at the pole z = —d/c, at 
which the denominator vanishes? Well, to preserve continuity, define 
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-- > oO. 


This way, z could approach the pole from just any direction. In either case, the 
transformed values would approach a unique point: the infinity point. 

There is one case in which these formulas coincide, and agree with each other. 
This happens if c = 0. In this case, both formulas read 


wor ow. 


Still, this makes no problem: c could safely vanish. There is something else that must 
never vanish. 
Indeed, to make sure that the mapping is invertible, the parameters a, b, c, and d 
must satisfy the condition 
ad — bc £0. 


Otherwise, we’d have ad = bc, which means that 
e either c 4 0, so the entire complex plane is mapped to 


az+b_ 1 acz+be 1 acz+ad_ a 
cezctd c cz+td c cztd c 


’ 


e orc =a = 0, so the entire complex plane is mapped to b/d, 
e orc =d = (0, so the entire complex plane is mapped to oo. 


In either case, the transformation would be constant, and not invertible. This is why 
the above condition is necessary to make sure that the original Moebius transforma- 
tion is indeed nontrivial and invertible. 


5.1.3 The Inverse Transformation 


Fortunately, the condition ad — bc # 0 is not only necessary but also sufficient to 
guarantee that the original Moebius transformation is invertible. Indeed, assume that 
the original complex number z is mapped to the new complex number 


az+b 
cztd 


To have the inverse mapping in its explicit form, let us write z in terms of wu. For this 
purpose, let us multiply the above equation by the denominator: 


u(cz +d) =az+b. 
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Now, let us throw those terms that contain z to the left-hand side, and the other terms 
to the right-hand side: 
z(cu — a) = —du +b. 


This implies that 
7 —du+b _ du—b 
= cu—a ~ ~cu+a’ 


Thus, the required inverse transformation is 


dz—b 


a 
—cz+a 


To make sure that this transformation is indeed continuous, we must also define it 
properly at oo and at its pole, a/c: 


These are just the reverse of the original definitions in Sect.5.1.2. This way, the 
inverse transformation indeed maps the infinity point back to the pole of the original 
transformation, as required. 


5.1.4 Moebius Transformation as a Matrix 


The parameters in the original Moebius transformation are defined up to a scalar 
multiple. After all, for any nonzero complex number g 4 0, the same transformation 


could also be defined by 
qaz+qb 
> ——.. 
qcz+qd 


Thus, the original Moebius transformation is associated with the 2 x 2 matrix 


ab 
cd)’ 
or just any nonzero scalar multiple of it. 
Similarly, the inverse mapping is associated with the matrix 


Gee) 
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or just any nonzero scalar multiple of it. As can be seen in Chap. 2, Sect.2.1.4, this 
matrix is just a scalar multiple of the inverse of the original matrix. 

Later on, we’ll see that this is not just an association, but much more: the matrix 
actually represents and mirrors the original transformation. In terms of a matrix, the 
original condition takes the form 


act((¢ ‘)) =ad —be £0. 


Why does this condition make sense? Because it guarantees that the original matrix 
is nonsingular (invertible), as required. 

Thus, the set of invertible Moebius transformations is mirrored by the set of 
(complex) nonsingular 2 x 2 matrices, defined up to a nonzero scalar multiple. Later 
on, we’ll refer to this as isomorphism or group representation [21, 25, 34, 35]. 
Indeed, it preserves the same algebraic structure: the composition of two Moebius 
transformations is mirrored by the product of the 2 x 2 matrices associated with 
them. 


5.1.5 Product of Moebius Transformations 


The product of the Moebius transformations m’ and m is defined as their composition: 
mm =m'om. 
This means that for every z € C U {ov}, 
(m'm)(z) = m'(m(z)). 


Note that this algebraic operation is associative. Indeed, for every three Moebius 
transformations m, m’, and m”, 


((m"m')m)(z) = (m"m')(m(z)) 
m(m'(m(z))) 
m((m'm)(z)) 
= (m"(m'm))(z). 


Since this applies to each and every z € C U {ov}, it can be written more concisely 
as 
(m"m')m = m'(m'm), 


which makes an associative law. This is no surprise: after all, as discussed below, 
this kind of composition is mirrored by matrix multiplication. 
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5.2. Matrix: A Function 


5.2.1 Matrix as a Vector Function 


As discussed above, each invertible Moebius transformation is associated with a non- 
singular 2 x 2 complex matrix (defined up to a nonzero scalar multiple). Fortunately, 
matrix-times-matrix multiplication could be viewed as a composition as well. 

For this purpose, consider a 2 x 2 matrix g. Let’s interpret it as a special vector 
function, rather than just a matrix: 


g: C > C’, 
with the explicit definition 
gv) =gv, (EC). 


Here, when g is followed by v, with no parentheses, then this is a matrix-vector 
product, as in Chap. 1, Sect. 1.4.4. This is then used to define the new vector function, 
which uses round parentheses. 

Why is this new interpretation equivalent to the original one? Because it charac- 
terizes g uniquely! Indeed, given a matrix g, we’ve already seen how it is used to 
define a unique vector function. Conversely, given a vector function of the above 
form, we can easily reconstruct the unique matrix that defines it. For this purpose, 
just apply the vector function g() to the standard unit vectors (1, 0)’ and (0, 1)’, to 
uncover the matrix g column by column: 


1 0 
g= (9 | g®), where #=a((9)) un =o((T)), 


5.2.2. Matrix Multiplication as Composition 


Let g and g’ be 2 x 2 matrices. With their new interpretation as vector functions, 
their product can now be viewed as a composition: 


/ / 
GG=9G9 °F. 
This means that for every two-dimensional vector v € C, 
(g'g)(v) = gg). 


Does this agree with the original definition—the matrix-times-matrix product in 
Chap. 1, Sect. 1.4.5? Well, let’s look at the matrix g’g. Thanks to associativity, it 
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defines the vector function 


(g'g)(v) = (g'g)v = g' (gu) = g'(g)), 


which is the same as the above composition. 

Furthermore, we’ve already seen that this matrix is unique: g’g is the only matrix 
that could be used to define the above composition. 

So, matrices could mirror these special vector functions: matrix-times-matrix 
could mirror composition. How does this help to mirror Moebius transformations? 
To get to the bottom of this, we better use the principle of induction: study not only 
one special case, but also a much wider field: groups. 


5.3. Group and Its Properties 


5.3.1 Group 


The above g is just a special case. In general, g could be not only a 2 x 2 matrix but 
also any element in a group. 

A group G is a set of elements or objects, with some algebraic operation between 
them. This operation is called multiplication or product. Thus, the group is closed 
under this kind of multiplication: for every two elements g and g’ in G, their products 
gg’ and g'g (which are not necessarily the same) are legitimate elements in G as well. 

This kind of multiplication might be rather strange and nonstandard. Fortunately, 
it mustn’t be too nonstandard: although it doesn’t have to be commutative, it must 
still be associative: for every three elements g, g’, and g” in G, 


g(g'9") = (gg')q". 


5.3.2. The Unit Element 


Furthermore, it is also assumed that G contains a unit element / that satisfies 


for every element g € G. Here, J has nothing to do with the identity matrix in 
Chap. 1, Sect. 1.5.2. It is just some special element in G. 

Fortunately, this unit element is unique in G. Indeed, assume that I’ € G was a 
unit element as well: 


gl =I'g=9 
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for every element g ¢€ G. In particular, this would be true for g = /: 
EPS, 


Since J is the original unit element, we also have 


In summary, 


so I’ is not really new: it is the same as the original unit element /. 


5.3.3 Inverse Element 


Finally, it is also assumed that every element g € G has an inverse element g’ € G 
(dependent on g), for which 


/ 


gg = 1. 
Even if the commutative law doesn’t hold in G in general, g’ does commute with g: 


/ 


gg=l. 
Indeed, thanks to the associative law, 
J9 =F UD =9 (G9) 9) = 9 G(9'D)) = (G'9)(9'9)- 


Now, let’s multiplying this equation (from the right) by an inverse of g’g, denoted 
by (g’g)’. Thanks again to the associative law, we have 


T= ((g'9)(g'9))(9'9)' = 99) (C9'9)(9'9)') = (G'G) = 99, 


as asserted. 
Furthermore, the inverse of g is unique. Indeed, assume that g had yet another 
inverse g”, satisfying 
gg" = T 


as well. Thanks to the associative law, we’d then have 


J =!o =999 =POQV HV =. 
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Thus, g” isn’t really new: it is the same as g’. The unique inverse of g can now be 
denoted by g~! rather than g’. Once we have this new notation, we can use g’ once 
again to stand for a general element in G, independent of g. 

Let us show that the inverse of the inverse is the original element itself: 


(g'\" =9. 


—1 is the inverse of g, not only from the right but also 


Indeed, we already know that g 
from the left: 


g gal. 


Fortunately, this equation could also be interpreted the other way around: g behaves 
as expected from an inverse to g~!. But g~! has only one inverse, so it must indeed 


be g: 
—1\-l 
(g') =9, 


as asserted. Thus, the inverse operation is symmetric: not only g™ 
g, but also g is the inverse of g7!. 
Finally, consider two general elements g’, g € G. How to invert their product g’g? 


Take the individual inverses, in the reverse order: 


! is the inverse of 


gg '=g'g. 


Indeed, look at the right-hand side. Does it behave as expected from the inverse of 
g'g? Well, thanks to the associative law, 


Yagi ')=(G'9r')I' = (9 (97"')) 1 = G'DaT = Gg’ = 1. 


The assertion follows now from the uniqueness property. 


5.4 Mapping and Homomorphism 


5.4.1 Mapping and Its Origin 


Here we recall some properties of sets and mappings, which are particularly relevant 
to the present discussion. Let G and M be some sets (not necessarily groups). A 
mapping € from G to M is a function 


E:G>M 


that maps each element g € G to an element 
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E(g) € M. 


Consider now some element m € M. What elements are mapped to it from G? There 
could be many. Let’s place them in one subset, called the origin of m under €: 


&'(m) ={g€G | Eg) =m} CG. 


This is just a notation for a subset of G. It has nothing to do with inverse. In fact, € 
may have no inverse mapping at all. After all, €~'(m) might contain more than one 
element in G. On the other hand, €~!(m) might also be completely empty. In either 
case, no inverse mapping could possibly be defined. 

We say that € is onto M if every element m € M has a nonempty origin, with at 
least one element in it: 


|é"(m)| =l{g eG | Eg) =m} =1, mem. 


Furthermore, we say that € is one-to-one if every element m € M has at most one 
element in its origin: 


|é'(m)| = l{g eG | EG) =m} <1, mem. 


Clearly, we can now combine these properties: € is a one-to-one mapping from G 
onto M if every element m € M has exactly one element in its origin: 


|é"(m)| =|{g €G | EG) =m}|=1, mem. 


This element is the one to which m is mapped by the inverse mapping €~!. Only in 
this case is € invertible. 


5.4.2. Homomorphism 


Unlike in a mere set, the elements in the group are algebraic: they can multiply each 
other. Furthermore, they can also be mapped to yet another group, to have a better 
idea about their nature and properties. For this purpose, however, the mapping must 
preserve or mirror the original algebraic structure. 

Let G and M be two groups. A (not necessarily one-to-one) mapping € from G 
onto M is called a homomorphism if it preserves algebraic operations: a product in 
G is mapped to the product in M (Fig.5.1). 

More precisely, for every two elements g, g’ € G, order doesn’t matter: multiply- 
ing in G and then transferring the product to M is the same as transferring to M and 
then multiplying in M: 

&(99') = (MEQ). 
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Fig. 5.1 The G M 
homomorphism € from the 
original group G onto the 
group M is not necessarily 
one-to-one. Still, it preserves 
(or mirrors) the algebraic 


. —— 
operation, denoted by the 
vertical arrows s 


Why is this convenient? Because, when € is not one-to-one, M is “smaller,” and easier 
to sort out. It is partially “blind:” it doesn’t distinguish between different elements 
in G that are of the same kind. 


4 4 
product in G e e product in M 
t t 


5.4.3. Mapping the Unit Element 


Since the homomorphism preserves the algebraic operation, it must map the original 
unit element J € G to the unit elementi € M: 


ED =i. 


Here, i has nothing to do with the notation i = ./—1, used often in complex analysis. 
This is just a coincidence that both notations use the same letter i. 
Indeed, 
ED = ECD = EE). 


Now, in M, €(/) must have an inverse. Let’s use it to multiply this equation (say 
from the right). Thanks to associativity, we then have 


i= E(D(EM)) | = EMED)ED)! = EC) (ED (EWM) ') = EWI = EW), 


as asserted. 


5.4.4 Preserving the Inverse Operation 


Thanks to the above properties, the homomorphism also preserves the inverse oper- 
ation (Fig. 5.2): while g maps to €(g), the inverse of g in G must map to the inverse 
of €(g) in M. Indeed, 


i = ED) = E(gg™') = E()EG™), 


sO 
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Fig. 5.2 Thanks to the G M 
homomorphism €, the ge g em 
inverse operation in the | | 
original group G is mirrored [Te < 

or preserved in the group M 4 4 

as well -1, eS eal 


(E(g))'| = E(97') 


in M, as asserted. 

Still, recall that the homomorphism is not necessarily one-to-one. Thus, J may be 
not the only element that maps to i. The elements that map to i, including 7, form a 
special subset: the kernel. 


5.4.5 Kernel of a Mapping 


The kernel of a mapping € (not necessarily ahomomorphism) contains those elements 
that are mapped to the unit element i in M: 


E'@M={gEG | &g =} 


(Fig. 5.3). Recall that this is just a notation for a subset of G. It means no inverse: 
after all, € is not necessarily invertible. 

If € is onto M, then the kernel must be nonempty (Sect.5.4.1). For example, in 
the present context, in which € is a homomorphism, the kernel must contain at least 
one element: the unit element J (Sect. 5.4.3). 

Unfortunately, the original homomorphism €, although it maps G onto M while 
preserving algebraic operations, is not necessarily one-to-one, so it is not necessarily 
invertible. For example, the kernel may contain more elements but J. This means 
that G and M do not exactly mirror each other. Fortunately, € can still be modified 
to form an invertible mapping. For this purpose, we need a new concept: subgroup. 


Fig. 5.3 The -1(; 
homomorphism € maps its ow) é M. 
entire kernel (on the left) to ° 

the unit element i € M (on a <e 


the right) ° ra e7 
e 
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5.5 The Center and Kernel Subgroups 


5.5.1 Subgroup 


A subgroup is a subset that is a group in its own right: a subset S C G is a subgroup 
if 


1. Sis closed under multiplication: 
s,s €ESSss' ES. 


2. S contains the unit element: 
TeS. 


3. Sis closed under the inverse operation: 
seSas'eS. 
Fortunately, S also inherits the associative law: 
sss" ES3s,s',s"€ GS (ss')s" = 5(s's"). 
Thus, S is indeed a legitimate group in its own right. 


As a matter of fact, to make sure that a subset S is also a subgroup, it is sufficient 
to check just two conditions: 


1. S is closed under division: 
! 1-1 
ss ESSass ES. 


2. S contains the unit element: 
TeS. 


Indeed, under these conditions, S is also closed under the inverse operation: 
seSast=Is'eS. 
As a result, S' is also closed under multiplication: 
s,s E€S3s,s'e€SSsss'=s Ga) eS. 


Thus, the original three conditions hold, so S is indeed a legitimate subgroup of G. 
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5.5.2 The Center Subgroup 


Recall that our original group G is not necessarily commutative: it may contain 
“bad” elements that don’t commute with each other. Still, it may also contain “good” 
elements that do commute with every element. Let’s place them in a new subset: the 
center C: 

C={ceG | cg=gce forevery g € G}. 


By now, we only know that C is a subset. Is it also a subgroup? Let’s see: is it 
closed under multiplication? Well, let c and c’ be two elements in C. Thanks to 
the associative law inherited from G, the product cc’ commutes with every element 
géG: 


(cc’)g = c(c’g) = c(gc’) = (cg)! = (ge)c’ = g(cc’). 


Thus, cc’ is in C as well, as required. 
Next, is the unit element J in C? Well, it does commute with every element g € G: 


Finally, is C closed under the inverse operation? In other words, for every element 
c € C, does its inverse c~' commute with every element g € G? Well, we already 
know that c does: 


cg = ge. 
Now, let’s multiply this equation by c~! from the right. Thanks to associativity, 
(cg)c"! = (gce)e"! = g(ce!) = gI = g. 


Now, let’s multiply this equation by c~! from the left: 


c "((eg)e =e te. 


Thanks again to associativity, 


gc! =I1(gc \e (c ne (gc ee "(ce (ge \) =e "((eg)e Ware \g. 


Thus, c~! does commute with every g € G, so it does belong to C as well, as required. 
This proves that the center C is not only a subset but also a subgroup of G. 


5.5.3 The Kernel Subgroup 


G has another interesting subset: the kernel €~' (i) of the homomorphism € : G > M 
(Sect. 5.4.5). Is it also a subgroup? 
Well, is it closed under multiplication? Well, for every two elements g, g’ € €~! (i), 
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(99) = E(Q)E(g‘:) = ti =i, 
so their product is in the kernel as well: 
gg € €'(i). 
Next, is the unit element J € Gin E-* (i) as well? Yes, it is (Sect.5.4.3). 


Finally, is the kernel also closed under the inverse operation? Well, for each 
element g € €~'(i), its inverse g~! is in €~'(i) as well: 


Eg) = €@)y =i" =i 


(Sect. 5.4.4). This proves that the kernel of € is not only a subset but also a subgroup 
of G. 


5.6 Equivalence Classes 


5.6.1 Equivalence Relation in a Set 


In a set G (not necessarily a group), what is a relation? Well, a relation is actually a 
subset of G7: it may contain an ordered pairs of the form (g, g’). We then say that g 


is related to g’: 
f 


g~g. 


What is an equivalence relation? Well, this is a special kind of relation: it has three 
properties: 


1. Reflexivity: every element g € G is related to itself: 
g9~ 9: 


2. Symmetry: for every two elements g, g’ € G, if g is related to g’, then g’ is 
related to g as well: 


grg =a g. 


3. Transitivity: for every three elements g, g', g” € G, if g is related to g’ and g’ is 
related to g”, then g is related to g” as well: 


g~g, J~g agar. 


Let’s use this to decompose the original set G. 
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Fig. 5.4 The original set G G 
is decomposed (or split) into 
disjoint lines, or equivalence 
classes 


5.6.2 Decomposition into Equivalence Classes 


Thanks to the equivalence relation, we can now decompose the original set G (which 
may be a group or not) in terms of disjoint equivalence classes (Fig.5.4). In this 
decomposition, each equivalence class contains those elements that are related (or 
equivalent) to each other. In particular, each element g € G belongs to one equiva- 
lence class only: 


g€t,={9 €G | g'~g}. 


Indeed, thanks to reflexivity, g ~ g, so g € w,. Now, could g belong to yet another 
equivalence class of the form ~,, for any other g’ € G? Well, if it did, then this 
would mean that g ~ g’. Thanks to transitivity, we’d then have 


g’ aS g => gl aS, J, 


so 


Wa Cc Wq'- 


On the other hand, thanks to symmetry, we’d also have g’ ~ g. Thanks again to 
transitivity, we’d then have 


g Dae: g => g ay 9, 


so 


Wq' es Wg: 


In summary, we’d have 


Wy = Wg 


sO Wy is actually the only equivalence class containing g, as asserted. 


5.6.3 Family of Equivalence Classes 


Let’s look at the family (or set) of these disjoint equivalence classes: 


{vy | g€G}. 
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In this family, it is assumed that there is no duplication: each equivalence class of 
the form w, appears only once, with g being some representative picked arbitrarily 
from it. Furthermore, in this family, each equivalence class is an individual element, 
not a subset. To pick its inner elements and obtain G one again, one must apply the 
union operation: 


G = Ugec ty. 


This is the union of all the 7,’s: it contains all their elements. 


5.6.4 Equivalence Relation Induced by a Subgroup 


So far, the discussion was rather theoretical. After all, we never specified what the 
equivalence relation was. Now, let’s go back to business. Assume again that G is not 
just a set but actually a group, as before, with a subgroup S C G. This way, S may 
help define (or induce) a new relation: for every two elements g’ and g in G, g/ is 
related to g if their “ratio” is in S: 


j~gifgg'€S. 
In other words, there is an element s € S that can multiply g and produce gq’: 


/ 


g =sg. 
In this case, s = g'g7! is unique. 

Is this an equivalence relation? Well, let’s check: is it reflexive? In other words, 
given ag € G, is it related to itself? Fortunately, it is: 


gg '=1€S, 


as required. 
Next, is it symmetric? Well, consider two elements g’, g € G. Assume that g’ ~ g, 


or gg"! € S. Since S is a subgroup, it also contains the inverse element: 


gf =(99"') €S, 


so g ~ g’ as well, as required. 

Finally, is it transitive? Well, consider three elements g, g’, g’ € G. Recall that S 
is closed under multiplication: if g’g~! € S and gg’! € S, then their product is in 
S as well. For this reason, thanks to associativity, 


gg = gf ge) 
= g! ((9'"'g') g') 
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=9' (9 (9'9"')) 
= (9'9"') (9'9") 


as well, as required. In summary, this is indeed an equivalence relation in the original 
group G. 


5.6.5 Equivalence Classes Induced by a Subgroup 


With this new equivalence relation, how does an equivalence class look like? Well, 
consider a particular element g € G. As discussed in Sect.5.6.2, it is contained in 
one equivalence class only: 


yg ={9 €G | gf ~g} 

={g' eG | jg €Ss} 
={JeGl gg 
=| 


/ 


g €G | g' =sg forsomes € S}. 


' = s for somes € S} 


Thus, the equivalence class takes the special form 


dy = Sg =({sg | se S}. 


5.7 The Factor Group 


5.7.1 The New Set G/S 


Let’s place these equivalence classes as individual elements in a new set (or family): 
{Sg | g €G}. 


Unfortunately, in this family, there is some duplication: each equivalence class of 
the form Sg may appear many times. In fact, every two equivalent elements g’ ~ g 
introduce the same equivalence class Sg’ = Sg into the above family. 

To avoid this, one might want to drop all the duplicate copies of the form Sq’. 
This way, Sg appears only once, with g being some representative picked arbitrarily 
from it. The resulting family is called G/S (Fig. 5.5). 
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Fig. 5.5 Disjoint G 
equivalence classes are G/S 
considered as individual . : 
elements in G/S 

e e 


Why is this name suitable? Because the original elements in G are regarded only 
up to multiplication (from the left) by just any element from S. This way, equivalent 
elements in G are united into one and the same element in G/S. 

Thus, G/S is completely blind to any difference between equivalent elements 
g' ~ g, and can never distinguish between them. After all, in G/S, such elements 
coincide to form the same element Sg’ = Sq. 


5.7.2. Normal Subgroup 


So far, G/S was just a set. To make it a group, we must define a proper multiplication 
between its elements. This is not always possible. 

To guarantee that it is, we must also assume that S' is normal in the sense that it 
“commutes” with every element g € G: 


Sg = 9S ={gs | se S}. 


In other words, for every s € S, there is an s’ € S (dependent on both g and s) for 
which 


/ 


gs =sg. 


In this case, s’ = g~'sg is unique. 
For example, S could be a subgroup of the center of G, defined in Sect. 5.5.2: 


SCCCG. 


In this case, s’ = s in the above equation. 

What’s so good about S$ being normal? Well, consider two elements g, h € G. In 
G, their product is just gh. Now, let s and s’ be two elements from S. What happens 
when g is replaced by the equivalent element sg, and h is replaced by the equivalent 
element s’h? Well, since S is normal, there is an s” € S such that 


Sg=gs. 


Thanks to associativity, 
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(sg)(s'h) = s(g(s'h)) = s((gs')h) = s((s"g)h) = s(s"(gh)) = (ss")(gh)). 


Thus, the product gh hasn’t changed much: it was just replaced by an equivalent 
element. 

In summary, thanks to normality, multiplication is invariant under the equivalence 
relation. In other words, order doesn’t matter: switching to an equivalent element 
and then multiplying is the same as multiplying and then switching to the relevant 
equivalent element. In terms of equivalent classes, this could be written as 


(Sg)(Sh) = S(gh). 


In G, this means that the left-hand side is the same equivalence class as the right-hand 
side. In g/S, on the other hand, this could be used as a new definition. 


5.7.3. The Factor (Quotient) Group 


In G/S, the original equivalence classes are considered as elements. How to multiply 
them with each other? Well, consider two elements of the form 


Sg, Sh € G/S, 
for some g, h € G. Their product in G/S is now defined as 
(Sg)(Sh) = S(gh). 
To be well-defined, this product mustn’t depend on the particular representatives g 
or h: replacing each of them by an equivalent element mustn’t affect the result. Since 
S is normal, this is indeed the case. 


Still, this is not the end of it. To be a group, G/S must also be associative. To 
check on this, consider three elements of the form 


Sg, Sh, Sk €G/S, 
for some g,h,k € G. Since G is associative and S is normal, 
(SgSh)Sk = S(gh)Sk = S((gh)k) = S(g(hk)) = SgS(hk) = Sg(ShSk), 
so G/S is associative as well. 
Still, this is not the end of it. To be a group, G/S must also contain a unit element. 
For this job, let’s choose 


S=SIEG/S. 


Indeed, for any element Sg € G/S, 
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SgSIT = S(gI) = Sg = SUg) = (SD) Sg. 


Still, this is not the end. To be a group, G/S must also be closed under the inverse 
operation. Fortunately, for every element of the form Sg € G/S, the inverse is just 
Sg~! € G/S. Indeed, 

SgSg''=S (g9"') =ST=S. 


This is the end of it: G/S is indeed a legitimate group. Let’s use it to modify 
the original homomorphism € defined in Sect.5.4.2, and make a new one-to-one 
isomorphism. 


5.7.4 Isomorphism 


Consider again the homomorphism 
E:G>M. 


As pointed out in Sect. 5.4.2, € is not necessarily one-to-one, so it is not necessarily 
invertible: there may be some element m € M with more than one element in its 
origin: |€~!(m)| > 1. 

Fortunately, € can still be modified to produce a one-to-one homomorphism: 
an isomorphism. To do this, consider the kernel €~'(i), defined in Sect.5.4.5. As 
discussed in Sect.5.5.3, this is a legitimate subgroup of G. Therefore, it can be 
substituted for S above, and induce a new equivalence relation in G. The equivalence 
classes can then be placed in the new set 


G/E'(i). 


Moreover, if the kernel is normal in the sense that it commutes with every element 
in G: 
E'Og = 9", GG, 
then this is not just a set but actually a new group: the factor group (Sects. 5.7.2—5.7.3). 
Fortunately, € doesn’t distinguish between equivalent elements in G. For example, 


if g and g’ are equivalent to each other, then there must be an element s € €~!(i) for 
which 


g=sg 
(Sect.5.6.5). Therefore, we must have 


E(g') = E(sg) = E(s)E(g) = i€(g) = E(g). 


166 5 Group Representation and Isomorphism Theorems 


Fig. 5.6 The new mapping G 

& maps disjoint equivalence M 
classes (or distinct elements 
in the factor group) to 
distinct elements in M 


Thus, € maps the entire equivalence class to one and the same element in M. This 
observation can now be used to form a new one-to-one mapping from the factor group 
G/€—'(i) onto M. In this new mapping, the entire equivalence class is mapped as a 
whole to its image: some element in M. This is indeed invertible: this image element 
could simply map back to the original equivalence class. 

More precisely, the new isomorphism 


&:G/é'@>M 


is defined by 
E(E"'@g) =€9), gEG 


(Fig.5.6). Why is this well-defined? Because it doesn’t depend on the particular 


representative g picked arbitrarily from the equivalence class. After all, one could 
pick any equivalent element g’ ~ g, and still have 


E(€"@9') = £9) =€@), 


as shown above. 


5.7.5 The Fundamental Theorem of Homomorphism 


Like the original homomorphism €, & is onto M. Indeed, for every element m € M, 
there is an element g € G for which €(g) = m. Therefore, 


5 (€'@g) =€@) =m, 
as required. 
Furthermore, & is one-to-one. Indeed, consider two distinct elements €~! (i) g and 
€-!()qg’ in G/€! (i). Clearly, g’ ~ g, so g'g7! ¢ E-' (i), so 
Eg VEQ)) ' = EQIE(T') =EG'G') Fi 
(Sect.5.4.4). As a result, 


3 (€"@g) =£@ 4£GQ) =F (€' Wo’), 
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Fig. 5.7 The Hew G/é"(i) 
isomorphism & from the 

factor group onto M (the 

horizontal arrows) preserves 

(or mirrors) the algebraic product in G/€ = (i) 
operations (the vertical 

arrows) 


M 


product in M 


ee 
> 
rr 


e-e~<e 
e-,-e~<e 


as asserted. In summary, 4 is indeed invertible. 


lanl 


Fortunately, & also preserves algebraic operations (Fig.5.7). Indeed, for every 
two elements g, g’ € G, we have 


5 ((€"@g) (€' 9’) = 4 (€"' OG) 

E(gg') 

EEG) 

= &(€"@g) & (E"' Wg’). 


In summary, the factor group G/£~!(i) is isomorphic to M: 
G/é-'(i) ~ M. 
Thus, these groups are exactly mirrored by each other, and have the same algebraic 
structure. 
This is the fundamental theorem of homomorphism, or the first isomorphism 
theorem. Later on, we’ll use it to prove two other important theorems: the second 


and third isomorphism theorems. Before doing this, however, we use it in our original 
application: Moebius transformations. 


5.8 Geometrical Applications 


5.8.1 Application in Moebius Transformations 


Let’s apply the above theory to the special case in which G is the group of 2 x 2 
nonsingular complex matrices, and M is the group of invertible Moebius transfor- 
mations (Sect. 5.1.4). In this case, the unit element J € G is the 2 x 2 identity matrix 


and the unit element i € M is the identity mapping 


Zz—>zandw-o, 
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for which a = d £ 0 and b = c = O in Sect. 5.1.2. 
In this case, what is the center of G? Well, it contains the nonzero scalar multiples 
of the 2 x 2 identity matrix: 


C={ceG | cg=gcforallg e G} = {zl | zEC, z #0} 


(see exercises below). From Sect.5.5.2, C is indeed a subgroup of G. 
Our job is to design a suitable homomorphism 


€:G—> M, 


with the kernel 
OSG, 


For this purpose, we need some geometrical preliminaries. 


5.8.2. Two-Dimensional Vector Set 


Let us use the above center subgroup C C G to define an equivalence relation in the 
set 


V =C’\ {(0, 0}. 


Here, ‘\’ means “minus” the set that contains the origin. Thus, V contains the nonzero 

two-dimensional complex vector. Although V is not a group but a mere set, it can 

still be decomposed in terms of disjoint equivalence classes, as in Sects. 5.6.1-5.6.3. 
For every two vectors v, v’ € V, let 


v ~ vifv’ = cv for somec € C. 


Since C is defined as in Sect. 5.8.1, this means that v’ is just a nonzero scalar multiple 
of v. Still, in principle, the same could be done with other normal subgroups as well. 
It is easy to see that this is an equivalence relation in the original set V. Indeed, 


e for every v € V, v = Iv. This shows reflexivity. 

e Furthermore, for every v, v’ € V, if v' = cv (for some c € C), then v = c7!v’. 
This shows symmetry. 

e Finally, for every v, v', v"” € V, if v’ = cv and v” = c'v’ (for some c,c’ € C), 
then v” = c'(cv) = (c’c)v. This shows transitivity. 


This proves that the aboverelation is indeed a legitimate equivalence relation in V. 
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the vertical complex plane {(0, z) | z € C} 


(c1, €2) 


the horizontal 
complex plane {(z,0) | z € C} 


the oblique plane {z(c1, c2) | z € C} 


Fig.5.8 A picture of C”: the two-dimensional complex vector (c1, c2)' spans an oblique plane—the 
equivalence class C(c\, c2)' 


5.8.3 Geometrical Decomposition into Planes 


Consider the nonzero two-dimensional complex vector 


= (2) eV. 
C2 


What is its equivalence class? Well, it takes the form 
Cu={ev | ce C}={zv | ze€C, c#0={2(2) | zeC, 20h 


In geometrical terms, this is just the oblique plane spanned by the vector v = (cy, C2)! 
(Fig. 5.8). 


5.8.4 Family of Planes 


Together, all such planes make the family 
V/C={Cv | ve V}. 


To avoid duplication, each individual plane of the form Cv appears only once, with v 
being some representative picked arbitrarily from it. Note that, unlike in Sect. 5.7.3, 
here V/C is just a set, not a group. This is because the original set V is not a group 
in the first place. 

In V, Cv is a subset: an oblique plane (Fig.5.8). In the new set V/C, on the 
other hand, Cv is just an element. To obtain the original set V once again, one must, 
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therefore, apply the union operation, to pick the inner vectors from each plane: 
V = Ucvev /C Cv 


(Sect. 5.6.3). 


5.8.5 Action of Factor Group 


The original group G acts on the set V: each element g € G acts on each v € V, 
transforming it into the new vector gu. Thanks to the above decomposition, this also 
applies to complete planes: the factor group G/C acts on v/C. 

Indeed, an element of the form Cg € G/C acts not only on individual vectors of 
the form v € V but also on complete planes of the form Cv: 


Cg(Cv) = C(gv). 
Why is this a legitimate definition? Because it is independent of the particular repre- 
sentative g or v. Indeed, since C is normal, replacing g by cg and v by c’v (for some 


c, c’ € C) changes nothing: 


C(cg)(C(c'v)) = C(cge'v) = C(ec’gv) = C(gv). 


5.8.6 Composition of Functions 


Thanks to the above action, each element of the form Cg € G/C can also be 
interpreted as a function 
Cg: V/C> V/C. 
After all, the original algebraic operation in G/C, defined in Sect. 5.7.3 as 
(Cg')(Cg) = C(g'g) (9. 9 € G), 
is mirrored well by function composition: 


(Cg’ 0 Cg)(Cv) = Cg'(Cg(Cv)) = Cg'(C(gv)) = C(g'gu) = C(g'g)(Cv), 


for every v € V. 
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(a, c2) 
horizontal plane: {(z,1) | z € C} i| Wa 


horizontal plane: {(z,0) | z € C} 0 c1/c2 


oblique plane: z(c1, c2) 


Fig. 5.9 The oblique projection P projects the oblique plane C(c1, cz)! to cy /c2. In particular, the 
horizontal complex plane {(z, 0) | z € C} projects to co 


5.8.7 Oblique Projection: Extended Cotangent 


Let us define the oblique projection 


P:V/C > CU {oo} 


P(c()\ a) ole ite 40 
c2 oo if Q= 0. 
Fortunately, this definition is independent of the particular representative (c1, C2)’. 
After all, for every nonzero complex number z, one may replace c; by zc, and cz by 
ZC, and still have the same projection. 


In geometrical terms, P can be viewed as an oblique projection on the horizontal 
plane 


by 


{(z, 1) | zeC} 
(Fig. 5.9). This way, P actually extends the standard cotangent projection, to apply 
not only to real numbers but also to complex numbers. 


The inverse mapping 
P-!:CU {oo} > V/C 


el) ifzeC 
ae 
0 Wwz7=—wM. 


In what sense is this the inverse? Well, in two senses: on one hand, 


can now be defined simply by 


P'(@)= 
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PoP =Cr=C 


is the unit element in G/C that leaves V/C unchanged: each oblique plane projects, 
and then unprojects. On the other hand, 


PP ey 
is the identity transformation in Sect.5.8.1: each complex number unprojects, and 
then projects back. 


Let’s use P and P~! to associate the original Moebius transformation with the 
relevant 2 x 2 matrix. 


5.8.8 Homomorphism Onto Moebius Transformations 


The association made in Sect.5.1.4 takes now the form of a new homomorphism 
€:G—> M, 


from the group of 2 x 2 nonsingular complex matrices, onto the group of invertible 
Moebius transformations: 


&(g) = PCgP (g €G). 
Why is this a Moebius transformation? Well, look what happens to a complex number 
Z: it transforms to 


z= PCgP™'z, 


or, in three stages, 
z> Poles CgP'z > PCgP™'z. 


In other words, z first unprojects to an oblique plane: 


(i) 


which is then multiplied from the left by 


and projects back: 
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ab Zz ab\(z az+b az+b 
(cale(i)=e((ca) (i) =e (eta) > cara 
as required. 
Furthermore, what happens to the infinity point? Well, it first unprojects to a 


horizontal plane: 
1 
o> Cc (4) j 
which then transforms by 
1 ab 1 ab 1 a 
(a) e(cahe(o)=e((ca)(a))=e(2), 


which then projects back: 


as required. 
Let us show that € is indeed a legitimate homomorphism. First, is it onto M? Well, 
as discussed in Sect. 5.1.4, every invertible Moebius transformation m € M has the 


explicit form 
az+b 


cz+d’ 
for some complex parameters a, b, c, and d, satisfying 
ad —bc £0. 
As discussed above, this transformation could also be decomposed as 
m = &(g) = PCgP™", 


where 


is a nonsingular matrix, with a nonzero determinant. So, € maps g to m, as required. 
Finally, does € preserve algebraic operations? Well, thanks to the associativity of 
function composition (Sects. 5.8.5—5.8.6), 


E(g)E(g) = (PCg' P7') (PCgP~") 
=—(PCo\(P PP) Cyr 
= P(Cg')(C1)(Cg)P' 
= P(Cq')(Cg)P' 
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= PC(g'g)P! 
= E(g'g). 


This proves that € is indeed a legitimate homomorphism, as asserted. 


5.8.9 The Kernel 


To design a proper isomorphism as well, we must also have the kernel of €. Fortu- 
nately, this is just the center of G: 


él) =C={zl | zeC, 240} 
(defined in Sect. 5.8.1). Let’s prove this in two stages. First, let’s show that 
Cert): 


Indeed, for every c € C, E(c) is just the identity transformation that leaves every 
complex number unchanged: 


€(c)(z) = PCcP™!(z) = PCcC (i) = PC ({) =z, 
and also leaves the infinity point unchanged: 
&(c)(0o) = PCcP~!(co) = PCcC (4) = PC (4) =o. 
Thus, €(c) is indeed the identity mapping, or the unit element in M: 


E(c) =i eM, 


SO 
Ccé'@, 


as asserted. 
Conversely, let us also show that 


ec, 


Indeed, if €(g) is the identity transformation z — z, then g must be a nonzero scalar 
multiple of the 2 x 2 identity matrix (see exercises below). 
In summary, 


PoC. 
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as asserted. This means that € doesn’t distinguish between matrices that are a nonzero 
scalar multiple of each other. In view of the discussion in Sect.5.1.4, € is indeed a 
good candidate to represent invertible Moebius transformations in terms of 2 x 2 
nonsingular complex matrices. 


5.8.10 Eigenvectors and Fixed Points 


If v € V is an eigenvector of g € G with the eigenvalue \ € C: 
gu =Av, 


then Cv contains eigenvectors only. After all, each element in Cv is a nonzero scalar 
multiple of v, so it must be an eigenvector as well, with the same eigenvalue 4. 
Furthermore, since g is nonsingular, \ must be nonzero. 

In this case, Cv is a fixed point that remains unchanged under the action of Cg. 
Indeed, from the definitions in Sect. 5.8.5, 


Cg(Cv) = C(gv) = C(Av) = Cv. 


Furthermore, in this case, PCv is a fixed point that remains unchanged under the 
Moebius transformation €(g): 


&(g)(PCv) = PCgP"!PCv = PCgCv = PCv. 


5.8.11 Isomorphism Onto Moebius Transformations 


Let us now use the fundamental theorem of homomorphism (Sects. 5.7.4—5.7.5) to 
design a new isomorphism 
&:G/C— M. 


Naturally, it is defined by 
5 (Cg) = €(g) = PCgP', CgeG/C. 


This way, & doesn’t distinguish between matrices that are a nonzero scalar multiple 
of each other: it views them as one element in G/C, and maps them as a whole to 
the same Moebius transformation. 

This is indeed a proper group representation. To see this, look at things the other 
way around. A Moebius transformation is not easy to store on the computer. To do 
this, use 5~!. In fact, each element m € M is mirrored (or represented) by the unique 
element 
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é(m) = P-"'mP =C (< ) eq 


After all, a matrix is easy to store on the computer. Furthermore, &~! also preserves 
the algebraic operation: it mirrors it by matrix product, easy to calculate on the 
computer. 


5.9 Application in Continued Fractions 


5.9.1 Continued Fractions 


Let us use the above theory to define a continued fraction. For k = 1,2,3,..., 
consider the Moebius transformations 


ak 


mz) = eb 


where a, and by, are some nonzero complex numbers (known as the coefficients). 


For n = 1, 2,3, ..., consider the compositions 
fi = My, 
ha =mMm,oMmMy? 
fs =m,om,0m3; 


tn 


M,OM20M30°+++OMy. 
As a matter of fact, this is just a mathematical induction: 


f={ mM, ifn=1 


fr-10m, ifn > 1. 


Now, let’s apply these functions to z = 0: 


fi), f2(0), f3(0), .... 


These are the approximants. We say that they converge (in the wide sense) to the 
continued fraction f if 
tn) > n> f € Cc U {oo} 


[27, 28]. In particular, if f € C is a concrete number, then the convergence is in the 
strict sense as well. If, on the other hand, f = oo is no number, then the convergence 
is in the wide sense only. 
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5.9.2 Algebraic Formulation 


To study the convergence, let’s use the factor group 


G/C=M, 
with the isomorphism & in Sect. 5.8.11. For this purpose, let’s define the new matrices 
in Sect. 5.1.4: 
_ 0 ak 
K = i ie 
This way, 


my = SE (Cox) = PCyP'. 


For example, both sides of this equation could be applied to the complex number 
0EC: 


m,(0) = PCg,P~!(0) = PC@C Gy = PCH (°) a 6 () a 
k by 


in agreement with the original definition of m,. 


5.9.3 The Approximants 


Let us use the isomorphism & to obtain the composition f,, as well: 


B(Cgigo°++ Gn) = E(CgyiCgr- + Can) 
= &(Cg\)F(Cgo)--- (Can) 


=M,0M20---OM,y 


= fie 


Both sides of this equation could now be applied to 0 € C: 


0 2 
PCQi92°** Gn (1) = PCg\92--+ InP~'(0) 


= &(Cgig2--+ gn) (0) 
fn (0). 


In other words, the approximant /,,(0) is just the ratio between the upper right and 
lower right elements in the matrix product g1g2--+ Gn: 
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(9192°°* Gn)i.2 


(0) = : 
sh (9192 °** Gn)2.2 


This observation will be useful below. 


5.9.4 Algebraic Convergence 


Actually, this matrix product is defined by mathematical inductiononn = 1,2, 3,...: 


gq ifn =1 


19° In= nae aca ifn > 1. 


Recall that g, is a special 2 x 2 matrix: its first column is just the standard unit 
vector (0, 1)’. For this reason, the above products also have a special property: the 
first column in g;g2--- gn is the same as the second column in g1 go - + - Gn—1: 


1 1 0 
9192°°°Gn @ = (9192°** Gn—1) («. (;)) = 9192°**Gn-1 (°) ‘ 


Thus, if convergence indeed takes place as n — oo, then both columns in g; 92 --- Gn 
must be nearly proportional to each other: the ratio between the upper and lower 
components must approach the same limit f. 

Note that these ratios remain unchanged upon multiplying the original product 
9192°°-9n by a nonsingular diagonal matrix from the right. Thus, the continued 
fraction f exists if and only if there exist diagonal matrices D, € G for which 


9192°°*GnDn noo (v | v), 


v=(S)ev. 
C2 


What is the meaning of this convergence? Well, it is interpreted elementwise: there 
are actually two independent limit processes here—one for the upper right element, 
and another one for the lower right element. Thanks to this convergence, the required 
continued fraction f can now be obtained by 


for some 


f = lim f,(0) = lim PCgig--- (") = PCv= 1 ECU {oo}. 
noo n>oo 1 C2 


Thus, to guarantee convergence, one only needs to design suitable diagonal matrices 
D, € G,in such a way that 992 --- gn D, converge elementwise to a singular matrix 
of the form (v | v) ¢ G, for some v € V. This can also be written more concisely 
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as 
Cg192°** Gn >n>0 (Cv | Cv). 


This means that G/C is not closed: elements from it could converge to a limit outside 
it [62]. In this case, if the second component in v is nonzero: 


c2 #0, 


then the convergence is also in the strict sense. Otherwise, it is in the wide sense 
only. 


5.10 Isomorphism Theorems 


5.10.1 The Second Isomorphism Theorem 


By now, we are rather experienced in “playing” with groups. Let’s use the funda- 
mental theorem of homomorphism (Sect.5.7.5) to prove another important theorem 
in group theory: the second isomorphism theorem, used later in projective geometry. 

For this purpose, let G be a group. Let T C G be a subgroup (normal or not). Let 
S Cc G be a normal subgroup. 

Consider TS. Is ita legitimate group? Well, it certainly contains the unit element. 
After all, J € T, and J € S. Now, is it closed under multiplication? Well, to check 
on this, let s,s’ € T 1 S. In this case, ss’ € T, and ss’ € S, as required. Finally, is 
it closed under the inverse operation? Well, to check on this, let s € 7 S. In this 
case, s~! € T, and s~! € S, as required. In summary, TM S is indeed a legitimate 


subgroup of T. 
Still, is it normal? Well, to check on this, lets € 71 S,andt € T. Since S C G 
is normal, there is an s’ € S such that st = ts’. Fortunately, s’ = t~'st € T, so 


s’ € TAOS, as required. 
What is the product of T times S? Well, it contains those products of an element 
from T with an element from S: 


TS=({ts |téeT, se S}CG. 
Is this a legitimate group? Well, it certainly contains the unit element. After all, 
I €T,andI € S. Now, is it closed under multiplication? Well, to check on this, let 
ts, t's’ € TS. Since S is normal, st’ = t's”, for some s” € S. Thus, 


(ts)(t's’) = t(st’)s’ = t(t's”)s' = (tt')(s"s') € TS, 


as required. 
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Finally, is it closed under the inverse operation? Well, to check on this, letts € TS. 
Now, since S is normal, s~!t~! = t7!s’”, for some 5” € S. Therefore, 


(ts)(t7!s’”) = (ts)(s7't7!) = t(ss7))t7! = tt! = T, 
as required. So, T'S is indeed a legitimate subgroup, although not necessarily normal. 


If T was also normal, then 7S would have been normal as well. Indeed, in this 
case, for each g € G and ts € TS, there would be t’ € T and s’ € S for which 


(ts)g = t(sg) = t(gs’) = (tg)s’ = (gt')s’ = git's’), 


as required. For our purpose, however, we don’t need this, so T could be either 
normal or not. 

S, on the other hand, is normal not only in G but also in TS. To see this, let 
ts € TS, ands’ € S. Since S C G is normal, and since ts € G, there is ans” € S 
such that (ts)s’ = s’ (ts), as required. 

The second isomorphism theorem says that 


T TS 


~w 


TAS. 28: 
To prove this, let’s use the fundamental theorem of homomorphism. For this purpose, 
define the new homomorphism 


ero by €(t) = St. 


Is this a legitimate homomorphism? Well, is it onto? Fortunately, it certainly is: after 
all, every element Sts € TS/S can also be written as Sts = Ss/t = St. 
Furthermore, € certainly preserves the original algebraic operation in T. So, it is 
indeed a legitimate homomorphism. 
Moreover, its kernel is T 1 S. So, we can now use the fundamental theorem of 
homomorphism to obtain 


as asserted. 


5.10.2. The Third Isomorphism Theorem 


Finally, let’s use the fundamental theorem of homomorphism to prove yet another 
important theorem in group theory: the third isomorphism theorem. Thanks to this 
theorem, groups may sometimes behave just like simple fractions: a common factor 
can be “canceled out.” 
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Let G be a group. Let S, T C G be two normal subgroups. Assume also that 
SCT. 

Note that S is normal not only in G but also in 7. Indeed, lett ¢ T, ands € S. 
Since t € G, there is an s’ € S such that st = ts’. 

Now, consider T/S C G/S.Is ita legitimate subgroup? Well, it certainly contains 
the unit element: S. Still, is it closed under multiplication? To check on this, let 
St, St' € T/S. Fortunately, StS’ = S(tt') is in T/S as well, as required. 

Furthermore, is it closed under the inverse operation? Fortunately, it is: the inverse 
element St~! is in T/S as well. 

So, T/S C G/S is a legitimate subgroup. Is it normal? Well, to check on this, let 
g € G,andt € T. Since T C G is normal, there is a t’ € T such that 


StSg = S(tg) = S(gt') = SgSt', 


as required. 
The third isomorphism theorem tells us that, as in simple fractions, S could be 


“canceled out:” G/s 
G/T ~ ——. 
/ T/S 


To prove this, let’s use the fundamental theorem of homomorphism. For this purpose, 
define the new homomorphism 


€:G>G/S by gy =Sg, g€G. 
On top of this, define yet another homomorphism: 


G/S 
é:G/S> a by &(Sg) =(T/S)Sg, Sg € G/S. 


Now, consider the composite homomorphism 


Gs 


&€:G> TS" 


What is its kernel? Well, it is just T! Indeed, on one hand, it includes T. After all, in 
T/S, a typical element is of the form St (for some t € T). Therefore, 


CE(t) = € (St) = (T/S)St = T/S, 


which is just the unit element in (G/S)/(T/S). On the other hand, the kernel of &’€ 
is also included in T. After all, if g ¢ G \ T, then 


€&(g) = € (Sg) = (T/S)Sg FT/S, 


because Sg ¢ T/S. 
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Thanks to the fundamental theorem of homomorphism, we therefore have 


G Pest 
/ = gat 


as asserted. In the next chapter, we’ll use the isomorphism theorems in projective 
geometry. 


5.11 Exercises 


1. Recall that G is the group of 2 x 2 nonsingular complex matrices. Let A and B 
be two matrices in G, denoted by 


a ( ) and B= Ce ) 
2,1 42,2 bo. b2,2 
Assume that B is also a diagonal matrix: 
bi2 = br; = 0. 
Show that the upper right element in the product AB is 
(AB)1,.2 = bo,.241,2. 
2. Show that the upper right element in the product BA is 
(BA)1,2 = b1,141,2. 
3. Assume also that A and B commute with each other: 
AB = BA. 


Conclude that 
bz,201,2 = (AB)12 = (BA)1,2 = b1,1412. 


4. Assume also that B is nonconstant: 
bit # ba2. 
Conclude that A must be lower triangular: 
a2=0. 


5. Similarly, show that the lower left element in the product AB is 
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15. 


16. 


17. 


(AB)21 = b1,1421. 


. Similarly, show that the lower left element in the product BA is 


(BA)21 = b2,2a2,1. 


. Conclude that if A and B commute with each other, then 


by,1€2,)(AB)2,1 = (BA)21 = b2,2a2,1. 


. Conclude that if B is also nonconstant, then A must be upper triangular: 


a2, = 0. 


. Conclude that, if A commutes with the nonconstant diagonal matrix B, then A 


must be diagonal as well: 
a\,2 = a21 = 0. 


. Conclude that if A commutes with every matrix B € G, then A must be diagonal. 
. Conclude that the center of G may contain diagonal matrices only. 

. Show that the diagonal matrices make a subgroup in G. 

. Conclude that the center must be a subgroup of that subgroup. 

. Assume now that A is diagonal: 


1,2 = 42,1 = 0, 

and B is not necessarily diagonal. Show that the upper right element in the 
product AB is 

(AB)\.2 = 44,1012. 
Similarly, show that the upper right element in the product BA is 

(BA)1,2 = a2,2b1,2. 
Conclude that if A commutes with B, then 

4),1b1,2 = (AB)1,.2 = (BA)1,2 = a2,2b12. 

Conclude that if B is not lower triangular: 


bi2 #0, 


then A must be constant: 
41.1 = 42,2. 
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18. 


19. 
20. 


21. 
22. 
23. 


24. 
2D) 


26. 


27. 


28. 


29. 
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Conclude that if A commutes with every matrix B € G, then A must be a 
constant diagonal matrix. 

Conclude that the center of G may contain constant diagonal matrices only. 
Conclude that the center of G may contain only nonzero scalar multiples of the 
identity matrix J € G. 

Show that the center of G contains all nonzero scalar multiples of J € G. 
Show that this is indeed a subgroup of G. 

Recall that M is the group of invertible Moebius transformations, with compo- 
sition as the algebraic operation. Show that the identity mapping 


i(z) az+b 
Ll — — 
7 cz+d . 


is indeed the unique unit element in M: 
im=mi=m, memM. 


(Recall that i has nothing to do with the imaginary number ./—1, denoted often 
by the same letter 7.) 

Show that, in the above formulation of i, in the numerator, we must have a # 0. 
Show also that, in the above formulation of 7, in the numerator, we must have 
b = 0. Hint: 


[(—-b\ a+b —b+b— a(-b+b) é 
L — = — = <6 

c— +d ad—be ad — bc 
Now, since i is the identity mapping, we must also have —b/a = 0, or b = 0. 


Show also that, in the above formulation of i, in the denominator, we must have 
c = 0. Hint: 


(=) a=4+b — ache ad — bc 
i = a OO. 


~e=ttd —-dt+d cd—d) 
Now, since i is the identity mapping, we must also have —d/c = ov, ord = 0. 
Conclude that, in the above formulation of i, we must also have d = a. Hint: 
thanks to the previous exercises, i(1) = a/d. Now, since i is the identity map- 


ping, we must also have a/d = 1, ord =a. 
Recall that € is defined by 


&(g) = PCgP' (eG) 


(Sect. 5.8.8). Show that € maps G onto M. 
Show that € preserves algebraic operations: 


EME(G') = E99), 9,9 €G. 
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30. 
31. 


Conclude that € is indeed a homomorphism. 
From the above exercises about the identity mapping i € M and its algebraic 
formulation, conclude that the kernel of €, € —!@), contains only nonzero scalar 
multiples of the 2 x 2 identity matrix J € G. 


. Furthermore, show that €~!(i) contains all nonzero scalar multiples of J € G. 

. Conclude that €—'(i) is the same as the center of G, calculated above. 

. Show that the upper triangular matrices make a subgroup in Go. 

. Similarly, show that the lower triangular matrices make a subgroup in Go. 

. Let n be some natural number. Consider the set G,,, containing the nonsingular 


n X n complex matrices. Show that /,,, the identity matrix of order n, is the unit 
element in G,. 


. Show that G,, is indeed a group. 
. Show that the diagonal matrices make a subgroup in G,,. 
. Show that the center of G,, is 


C, = (zh | zeC, z AO}. 


. Verify that C,, is indeed a group in its own right. 
. Conclude that C,, is indeed a legitimate subgroup of Gy. 
. Show that G, acts on the set C” \ {0} (the n-dimensional space, without the 


origin). 


. Conclude that G, can be interpreted as a group of vector functions defined on 


C” \ {0}, with composition playing the role of the algebraic operation. 


. Show that the factor group G,,/C,, is indeed a group. 
. Show that G,/C, acts on (C” \ {0}) /Chy. 
. Conclude that G,/C, can be interpreted as a group of functions defined in 


(C” \ {0}) /C,, with composition playing the role of the algebraic operation. 


. Show that the upper triangular matrices make a subgroup in G,,. 
. Similarly, show that the lower triangular matrices make a subgroup in G,,. 
. Use the discussions in Sects.5.9.1—5.9.4 to study the convergence of periodic 


continued fractions. Let j be a fixed natural number (the period). Assume that 
the coefficients in the original continued fraction are periodic: 


Ak+j = Ak and bee j = De, k>1. 
Find algebraic conditions on the eigenvalues of the matrix product 


9i92°°* Gj 


that guarantee convergence to the continued fraction f. The solution can be 
found in [62]. 


Chapter 6 ®) 
Projective Geometry with Applications cree 
in Computer Graphics 


What is a geometrical object? It is something that we humans could imagine and 
visualize. Still, in Euclidean geometry, a geometrical object is never defined explic- 
itly, but only implicitly, in terms of relations, axioms, and logic ({22] and Chap. 6 in 
[63]). 

For example, a point may lie on a line, and a line may pass through a point. Still, 
a line is not just a collection of points. It is much more than that: an independent 
object, which may contain a given point, or not. 

This way, Euclidean geometry uses no geometrical intuition, but only logic. After 
all, logic is far more reliable than the human eye. 

Still, logic doesn’t give us sufficient order or method. For this purpose, linear 
algebra is far better suited. How to use it in geometry? 

The answer is in analytic geometry: the missing link between geometry and alge- 
bra. For this purpose, we introduce a new axis system. This way, a line is no longer 
an independent object, but a set of points that satisfy a linear equation. 

This way, points are the only low-level bricks. Lines, angles, and circles, on 
the other hand, are high-level objects, built of points. Since these points satisfy an 
algebraic equation, it is much easier to prove theorems. 

In projective geometry, on the other hand, we move another step forward: we use 
not only analytic geometry but also group representation and topology [15, 79]. For 
this purpose, we use the isomorphism theorems proved above. This way, we have a 
complete symmetry between points and lines, viewed as algebraic (nongeometrical) 
objects: a line may now be interpreted as a point, and a point may be interpreted as 
a line. 

In the projective plane, the original axioms in Euclidean geometry take a much 
more symmetric form. Just as every two distinct points make a unique line, every 
two distinct lines meet at a unique point. (In particular, two parallel lines meet at an 
infinity point.) After all, as pure algebraic objects, points and lines mirror each other, 
so their roles may interchange. 
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Similarly, in the projective space, there is a complete symmetry between points 
and planes. Just as every three independent points make a unique plane, every three 
independent planes meet at a unique point. As a result, the roles of point and plane 
may interchange: a plane may be viewed as a single point, whereas a point may 
be viewed as a complete plane. After all, both can be interpreted as pure algebraic 
objects, free of any geometrical intuition. 

In this chapter, we use group theory to introduce the field of projective geometry. 
In particular, we use matrix—vector and matrix—matrix multiplication to form a group 
of mappings of the original projective plane or space. 

To introduce group theory, we focused on an individual transformation, or on the 
matrix that represents it. Here, on the other hand, we focus on the original geometrical 
object (points, lines, and vectors), rather than the mapping that acts upon it. This 
approach is particularly useful in computer graphics [54, 74]. 


6.1 Circles and Spheres 


6.1.1 Degenerate “Circle” 


We start with some preliminary definitions, which will be useful later. In particular, 
we define circles, spheres, and hyperspheres in higher dimensions. For this purpose, 
we must start from a degenerate zero-dimensional “circle”. 

Consider the one-dimensional real axis. Consider two points on it: —1 and 1. 
They are antipodal (or opposite) points: placed symmetrically at opposite sides of 0. 
Together, they make a new set: 


s° = {-1, 1}. 


Note that this notation has nothing to do with the subgroup S in Chap. 5. 

In what sense is S° a “circle?” Well, it contains those real numbers of absolute 
value 1. On the real axis, there are just two such points: —1 and 1. In this sense, the 
diameter of S° is just the line segment leading from —1 to 1. 

Now, let’s also introduce an orthogonal axis: the y-axis. This forms the two- 
dimensional Cartesian plane. In this plane, the original one-dimensional real axis is 
embedded into the horizontal x-axis (Fig. 1.6). In particular, the original antipodal 
points —1 and 1 embed into a new pair of antipodal points: (—1, 0) and (1, 0), on 
the new x-axis. This is the embedded S°. 

The original diameter also embeds just the new line segment leading from (— 1, 0) 
to (1, 0) on the new x-axis. Next, we’ll use this embedded diameter to produce a 
more genuine circle. 
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Fig. 6.1 Antipodal points y 
on the unit circle 


LY 


6.1.2 Antipodal Points in the Unit Circle 


In the Cartesian plane, the embedded diameter can now rotate counterclockwise, 
making a larger and larger angle @ with the positive part of the x-axis. For each 
angle 0 < @ < zr, this makes a new pair of antipodal points, placed symmetrically 
at opposite sides of (0, 0): 


(cos(@), sin(@)) from above, and (—cos(@), —sin(@)) from below. 


(Fig. 6.1). This way, the original point (1, 0) draws the upper semicircle. Its antipodal 
counterpart (—1, 0), on the other hand, draws the lower semicircle. Together, they 
draw the entire unit circle: 


S'={(,y)eR? | x? +y?=1]}. 


6.1.3 More Circles 


The unit circle S' can now shift to just any place in the Cartesian plane, to form a 
new circle. 

In analytic geometry, a circle is characterized by two parameters: a point O = 
(Xo, Yo) to mark its center, and a positive number r > 0 to stand for its radius 
(Fig. 6.2). From Pythagoras’ theorem, the circle contains those points of distance r 
from O: 

{@,y) eR? | @— x) +(y—y)? =r}. 


This is indeed a link to algebra: the circle is no longer a low-level abstract object as 
in Euclidean geometry, but rather a set of points that satisfy an algebraic equation. 
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Fig. 6.2 A circle centered y 


at O = (Xo, Yo) 
“+ © 


Lo 


6.1.4 Antipodal Points in the Unit Sphere 


What is a diameter in S!? It is a line segment connecting two antipodal points: 
[(— cos(@), — sin(@)), (cos(@), sin(@))]. 


Now, let’s introduce yet another axis: the vertical z-axis, to form the three-dimensional 
Cartesian space. This way, the original unit circle embeds right into the horizontal 
x-y-plane, for which z = 0: 


S' x {0} ={@,y,0) eR | x+y? =1}. 
In particular, the above diameter also embeds into 
[(— cos(@), — sin(@), 0), (cos(@), sin(@), 0)]. 


Let’s go ahead and rotate it at angle ¢, upwards into the new z dimension. For each 
angle 0 < ¢ < 7, this makes a new pair of antipodal points: 


(cos(@) cos(@), sin(@) cos(@), sin(¢)) 
(which draws the upper semicircle) and 
(— cos(@) cos(#), — sin(@) cos(@), — sin(@)) 


(which draws the lower semicircle). 

This can be done for each and every pair of antipodal points in the embedded S", 
characterized by some @. Once this is done for all 0 < 6 < z, the upper semicircles 
make the upper hemisphere, and the lower semicircles make the lower hemisphere. 
Together, we have the entire unit sphere: 


V={ay2deR | x*+y’4+r=1}. 
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6.1.5 General Multidimensional Hypersphere 


The above procedure may repeat for higher and higher dimensions as well. By math- 
ematical induction on n = 1, 2,3, ..., we obtain the general hypersphere: 
S"' = {(u, ,...,0,) ER” | vp tuy+---+u7= 1}. 
Note that this is just a notation: the superscript n — | is not a power, although it has 
something to do with power: it reflects the fact that the above procedure has been 
iterated n — 1 times, to form S!, S?,..., S"~!. 
For instance, by setting n = 4, we obtain the hypersphere 


S={a,yzw)eR | x+y4te4+w=1}. 


6.1.6 Complex Coordinates 


To visualize the above hypersphere geometrically, let’s introduce the new complex 
coordinates c, and c. The original real coordinates x and y will then serve as real 
and imaginary parts in cy. The third and fourth real coordinates, z and w, on the other 
hand, will serve as real and imaginary parts in cp. 
The original hypersphere S* can now be defined in terms of the new complex 
coordinates: 
S? = {(c1,¢2) €C* | fel? + |e2l? = 1}. 


To illustrate, let’s define r = |c,|: the radius of some circle in the c)-plane, around 
the origin x = y = 0 (Fig. 6.3). The complementary circle of radius |c2| = \/1 — r, 
on the other hand, can then be drawn in the c2-plane, around the origin z = w = 0 
(Fig. 6.4). 

Now, let’s pick one point from the former c,-circle, and another point from the 
latter c2-circle (Fig. 6.5). Together, they form a new four-dimensional point (c;, c2) = 
(x, y,Z,w) € S3: 


Fig. 6.3. The first complex y 
coordinate c} = x + yV—l. 

The circle contains complex 

numbers c, with 

loi |? =x? + y? =r’, fora 

fixed radius r > 0 


O 
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Fig. 6.4 The second w 
complex coordinate 

co =z+w—l. The circle 

contains complex numbers 

c2 with 

lo? =2+w?=1-P?, 

where r < | is the radius of 

the former circle: the 

c,-circle above 


Fig. 6.5 The new |c1|-|c2| |c2| 
plane, where 

cy =x + y/—1 and 

co =z+w/—1 are formed 

from the original 

(x, y,Z,w) € R*. The arc 

contains those points for 

which |c,|? + |co|? = 1, 

including those points in the 

cy- and c2-circles above 


r je1| 


eye t w= lel? + lel =r? +1—7? = 1. 
How does the c,-circle in Fig. 6.3 relate geometrically to the cz-circle in Figs. 6.4? 
This is illustrated in the two-dimensional (|c;|, |c2|)-plane (Fig. 6.5). This completes 


the missing link between c; and cz: one just needs to pick a point from the arc 


2 2 
ler|" + leg|" = 1. 


6.2 The Complex Projective Plane 


6.2.1 The Complex Projective Plane 


We start from the easy case: the complex projective plane. Actually, we’ve already 
met it in Chap. 5, Sect. 5.8.7. Here, however, we put it in a wider geometrical context. 
Recall the set of nonzero two-dimensional complex vectors: 


V =C’ \{(0,0)} 
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(Chap.5, Sects. 5.8.2—5.8.4). Recall that it splits into disjoint planes of the form 


cr \ _ Cl 
c(i )ea{e(2) 1zec. exo, 
where c; and cz are some complex parameters that do not both vanish at the same 
time (Fig. 5.8). 
How to visualize such a complex plane like? Well, for instance, by setting c. = 0, 


we have the horizontal complex plane in Fig.5.9. By setting c; = 0, on the other 
hand, we have the vertical complex plane 


{(0,z) | z€C, z# 0}. 


Thus, the complex projective plane is just the family of such planes, each considered 
as an individual element: 
V/C ={Cvu | ve V}. 


In this set, there is no duplication: each element of the form Cv appears only once, 
with v being some representative picked arbitrarily from it. 

This is no group: there is no algebraic operation. After all, V was never a group 
in the first place. So, there is no point to talk about homomorphism. Still, there is a 
point to talk about homeomorphism, to visualize how V/C looks like topologically, 
or how continuous it may be. 


6.2.2 Topological Homeomorphism onto the Sphere 


In Chap.5, Sect.5.8.7, we have already seen that the complex projective plane is 
topologically homeomorphic to the extended complex plane: both have the same 
continuity properties. Furthermore, the extended complex plane is topologically 
homeomorphic to the sphere. Thus, in summary, the complex projective plane is 
topologically homeomorphic to the sphere: 


V/C CU {oo} x S?. 


Here, “~” means topological homeomorphism (an invertible mapping that preserves 
continuity), not algebraic isomorphism. After all, these are just sets, not groups, 
so there is no algebraic operation to preserve. Below, we’ll extend this to higher 
dimensions as well. 
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6.2.3 The Center and Its Subgroups 


In Sect.6.1.5, we’ve defined the general hypersphere S’~' C R”. Note that this 
notation has nothing to do with the subgroup S' in Chap. 5. 

Now, let’s use G: the group of 2 x 2 nonsingular complex matrices (Chap. 5, 
Sect.5.8.1). In G, the unit element is just the 2 x 2 identity matrix 7. Furthermore, 
the center C C G contains the nonzero scalar multiples of 7. (See exercises at the 
end of Chap. 5). 

Let’s write C as the product of two subgroups. For this purpose, let H C C 
contain the positive multiples of /: 


H={rl | reR,r>0}. 


It is easy to see that H is indeed a group in its own right. 
Now, let us define yet another subgroup U C C: 


U={zl | zeEC, |z|=1}. 
Using the unit circle 5S! (Sect.6.1.2), this can be written more concisely as 
Cas. 


It is easy to see that U is indeed a group in its own right. 


6.2.4 Group Product 


What is the product of U and H? Well, it contains those products of an element from 
U with an element from H: 


UH = {uh | ue U, he A} 
(Chap. 5, Sect.5.10.1). 
The original group G isn’t commutative: two matrices not necessarily commute 
with each other. Fortunately, its center C is. For this reason, 


uh=hu, uhe UH. 


This implies that 
UH = AU. 


Moreover, it also implies that U H is indeed a group in its own right. 
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6.2.5 The Center—A Group Product 


So, UH is a subgroup of G. Is it a subgroup of C as well? Well, to check on this, 
let’s pick an element from U H: 


wirl=(ur)TEC (weC, jw)=1, rEeR, r>0). 


This shows that 
UH CC. 


Conversely, is C a subgroup of UH? Well, to check on this, let’s pick an element 
from C: anonzero complex multiple of 7. Fortunately, each nonzero complex number 
z € Chas the polar decomposition 


z= |zlexp@/—1), 


where 
0 = arg(z) 


is the angle that z makes with the positive part of the real axis. (See exercises at the 
end of Chap. 8.) Thus, 


zl =exp(@V—l)I|zZ|I €UH (zEC, z #0), 


as required. This implies that 
C CUR. 


In summary, we have 
C=UH. 


Let’s use this decomposition to visualize the complex projective plane geometrically. 


6.2.6 How to Divide by a Product? 


Thanks to the above factorization, each individual vector v € V spans the complex 
plane 
Cv = (UH)v = U(Ad), 


where Hv is just a “ray”: 


Hv=thv | he A}={rv | re R, r >0}, 
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and U (Hv) is not just one ray but a complete fan of rays, making a complete complex 
plane: 


U(Hv) ={u(hv) | he H, we U} 
= exp(@Vv—Trv |r0ER, r>0,0<0< an}. 


Together, all such planes make the complex projective plane: 


V/C = = ((UH)v | ve V} =(U(Hd) | Hue V/H) = 
= = v |v = v v a : 
UH U 
In the above, there is no duplication: each equivalence class in V/H is represented, 
say, by a unique unit vector in S*: 


v= (). 


Thus, V/H is mirrored by the hypersphere S° in Sect. 6.1.5. Since U is also mirrored 
by S', we have 
ViCZS7/S'. 


Here, “” stands for topological homeomorphism (an invertible mapping that pre- 
serves continuity), and $?/S! contains equivalence circles in the hypersphere S°. 
Let’s see how this looks like geometrically. 


6.2.7 How to Divide by a Circle? 


How to “divide” by S'? Well, shrink each equivalence circle into a single point that 
lies in it. In Chap.5, Sect.5.8.7, this is done algebraically: divide by the second 
complex coordinate, cz. Here, on the other hand, this is done geometrically. 

Fortunately, the equivalence circle has already shrunk with respect to |c2|. All that 
is left to do is to shrink it with respect to the angle arg(c2) as well. For this purpose, 
the circle in Fig. 6.4 has to shrink to just one point on it. This way, the second complex 
coordinate, co, reduces to the nonnegative real coordinate |c2|. This produces the top 
hemisphere in the three-dimensional (x, y, |c2|)-space: 


{(x, y. lel) ER? | leo) > 0, x? +y? + le? = 1}. 
What do we have at the bottom of this hemisphere? Well, this is the equator: 


or co = 0. 
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In the arc in Fig. 6.5, this is the lower endpoint: r = |c;| = 1. 

At the equator, we can no longer divide by cy = 0. Instead, we must divide by 
c;. More precisely, it is only left to divide by arg(c,). This shrinks the entire equator 
into the single point C(1, 0)‘: the infinity point in the complex projective plane. This 
“closes” the hemisphere from below, at the unique infinity point that contains the 
entire (shrunk) equator. 

What is this topologically? We already know it well: this is just the sphere in 
Sect. 6.1.4! In summary, we have the topological homeomorphism 


Vic=ss' 2s. 


in agreement with Sect. 6.2.2. 


6.2.8 Second and Third Isomorphism Theorems 


In the above, we dealt with sets, not groups. This is why “~” meant just topological 
homeomorphism, not algebraic isomorphism. After all, there is no algebraic opera- 
tion to preserve. In this section, on the other hand, we deal with groups once again. 
In this context, “” means not only topological homeomorphism but also algebraic 
isomorphism. 

To understand its inner structure, we better reconstruct the original factor group 
G/C more patiently, in two stages. First, define G/H, whose elements are of the 
form 

Hg=irg |reR,r>O}. 


What is Hg? Well, it has two possible interpretations. In G/H, it is an individ- 
ual element. In G, on the other hand, it is a subset: an equivalence class (induced 
by the subgroup H C G), which may contain many elements. Fortunately, these 
interpretations mirror each other. 

What is the center of G/H? Well, it is just 


C/H CG/H 
(see exercises below). Its elements are of the form 
Hexp(0V-1), OER, 0<6 <2z. 


Now, let’s go ahead and divide by this center, to have the factor group of factor groups: 
(G/H)/(C/H). What is a typical element in it? Well, take the factor group C/H in 
the denominator, and use it to multiply a representative from the factor group G/H 
in the numerator. This produces an individual element in (G/H)/(C/H), which can 
also be viewed as a subset of G/H: a complete equivalence class in G/H, induced 
by C/H: 
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(C/H)Hg = | Hexp@V—DHg |9€R, 0<6< 2m 


= {H (exp@V—Dg) | @E€R, 0<0< 2n| 
CG/H 


(Chap. 5, Sects. 5.7.3 and 5.8.5). 
So, we also have a mirroring between an individual element in (G/H)/(C/H) 
and a complete equivalence class in G: 


(C/H)Hg @ {exp(@v=Trg |n0e€R, r>0,0<0< 2x| Sac: 


This makes a new isomorphism: 


G/H | 
CH = SIC. 


But we already know this formula well: this is just the third isomorphism theorem 
(Chap. 5, Sect.5.10.2). 
Furthermore, in our case, we know quite well how C/H looks like 


C/H~U. 


As a matter of fact, this is just a special case of the second isomorphism theorem. 
Indeed, in Chap.5, Sect.5.10.1, just substitute T <- U and S < H, and note that 
they have just one element in common: the unit element. 

Combining these results, we can write the above less formally: 


G/H 
gow. 
U 


Thus, we got what we wanted. To divide by C, one could use two stages: first, divide 
by H; then, divide by U as well. After all, C factorizes as C = UH. This way, a 
typical element Cg € G/C is mirrored by U(Hg). 

How does Cg act on the plane Cv € V/C? Well, this action can now factorize as 
well: 


Cg(Cv) = U(Hg)U (Av) = U(AgHv) = U(A(gv)) = (UH)gv = Cgv, 


as required. This is a rather informal writing style. After all, as a subgroup of G, 
U could act on an individual element in V or G, but not in V/H or G/H. On the 
latter, what should act is C/H, not U. Still, as we’ve seen above, this is essentially 
the same. 
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All these algebraic games are very nice, but give little geometrical intuition. For 
this purpose, it is sometimes better to drop complex numbers altogether, and stick to 
good old real numbers. Let’s start from the simplest case. 


6.3 The Real Projective Line 


6.3.1 The Real Projective Line 


What is the real projective line? First, redefine V to contain real vectors only: 
V =R’\ {0,0}. 


Furthermore, redefine G to contain real matrices only, This way, its center C contains 
only real multiples of /: 


C= {xl | xeER, x 40} =(R\ {OP I. 


This way, we have 
R? \ {(0, 0)} 


V/C= : 
: (R \ {O}) 7 


This is the real projective line. Why line? Because, in the Cartesian plane, it can be 
modeled by the horizontal line y = 1: each (oblique) line of the form Cv € V/C 
meets this horizontal line at one point exactly. This is indeed the oblique cotangent 
projection (Fig. 5.9). 

There is just one exception: the x-axis doesn’t meet the above horizontal line at 
all, so it must map to oo. Fortunately, there is a more uniform way to model the 
real projective line geometrically. For this purpose, we must use some algebra once 
again. 

What are the subgroups of the new center C? Well, once confined to real numbers 
only, U is redefined to contain two elements only: J and —/: 


U={xl | x eR, |xJ=l}=4+I1=S°7 


(Sect. 6.1.1). The second subgroup H, on the other hand, remains the same as before. 
Thus, each ray of the form Hv € V/H is spanned by a unique unit vector in S!: 


Hv=H(—). 


This can be done for every v € V. Together, we have a fan of rays, each represented 
by a unique point, at which it meets the circle: 
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vin ={H(%) | x,yER, vty niles 


(Sect. 6.1.2). Hereafter, “~”’ means topological homeomorphism only, not algebraic 
isomorphism. After all, on the left, we have just a set, not a group, so there is no 
algebraic operation to preserve. 

In summary, the real projective line takes the form 


VV/H_ 6 
Vite 
UH U 


This is the divided circle. 


6.3.2. The Divided Circle 


As discussed above, the real projective line is associated with the divided circle: 
v/c~s'/s°. 


How to visualize this geometrically? Well, this is illustrated in Fig.6.6. Each line 
of the form Cv (for some v = (x, y)' € V) meets the unit circle at two antipodal 
points: v/||v|| and —v/||v||. Fortunately, in the divided circle, they are just one and 
the same point. 

What happens in the horizontal line v = (1, 0)’? Well, in this case, Cv is just 
the x-axis: the infinity object in the real projective line. Indeed, in the cotangent 
projection in Fig. 5.9, this line maps to oo. Fortunately, in the divided circle, this line 
is mirrored well by the pair (+1, 0) (Fig. 6.6). 

How to visualize the divided circle geometrically? Well, take the original unit 
circle in Fig.6.1, and consider each pair of antipodal points as just one point. This 
is like taking just the upper semicircle (Fig. 6.7). The lower semicircle, on the other 


Fig. 6.6 The line Cv meets v 
the unit circle at two 
antipodal points: +v/||v||. 
Fortunately, in the divided 
circle, they coincide with 
each other. For example, the 
horizontal x-axis, C(1, 0)’, 
is represented by the pair v/|lrl 
(+1, 0) 
1 1 


—v/|lol| 


Cv 


the x-axis 
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Cu 


the x-axis 


Fig. 6.7 The top semicircle is enough: each line of the form Cv is represented by the unique point 
v/||v||. There is just one exception: the horizontal x-axis C(1, 0)‘ is still represented by the pair 
(+1, 0). In the divided circle, these points are considered as one and the same. Topologically, this 
“closes” the semicircle from below, producing a circle 


hand, could drop. After all, each point on it is no longer necessary: it is already 
mirrored by its upper counterpart. 

Or is it? Well, there is just one exception: the points (+1, 0) are both needed, 
and shouldn’t drop. Instead, they should unite into just one point. Topologically, this 
“closes” the semicircle at the bottom, producing a closed circle: 


Vic~r~s'/S~s'. 


We are now ready to move on to higher dimensions as well. 


6.4 The Real Projective Plane 


6.4.1 The Real Projective Plane 


Let’s move on to a yet higher dimension. For this purpose, redefine V as a three- 


dimensional vector set: 
V =R’ \ {(, 0, 0)}. 


This way, a vector in V is specified by three real degrees of freedom: its first, second, 
and third coordinates. The projective plane V/C, on the other hand, gives away one 
degree of freedom: the unspecified scalar multiple. This is why V/C depends on two 
degree of freedom only, and is referred to as a plane. 

Furthermore, G is also redefined as the group of 3 x 3 nonsingular real matrices. 
The unit element in G is now the 3 x 3 identity matrix: 
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The subgroups C, U, and H are also redefined to use this new /: 


C={xI | xeER, x 40}=(R\{OH/ 
U={xl | xeR, ljx)=Y=41=S7 
H={xI | xéER, x > 0}. 


With these new definitions, V/C is called the real projective plane. Why plane? 
Because, in the Cartesian space, it can be modeled by the horizontal plane z = 1: 
each oblique line of the form Cv € V/C meets this plane at one point exactly. 

The real projective plane has an important advantage: it has not just one infinity 
point, but many infinity points from all directions. 


6.4.2 Oblique Projection 


To get a better idea about V/C, project it onto the horizontal plane 
{z = 1} ={(@, y, 1) | x,y eR}. 


(Note that, unlike in the complex case, here z stands for a real coordinate.) More 
precisely, each line of the form C(x, y, z)' (with z 0) projects onto the unique 
point at which it meets the above horizontal plane: 


x — 
Cliy + (2.2.1). 
. ZZ 


In Fig. 6.8, this horizontal plane is viewed from an eye or a “camera” placed at the 
origin (0, 0, 0), faced upwards. Through this camera, one could see the semispace 


{(x,y,z)€R*? | z>0}. 


Fig. 6.8 Oblique projection z 
onto the horizontal plane 
z = 1. Each line of the form 


C(x, y, z)' ( £0) projects oa 
onto (x/z, y/z, 1) al y 


6.4 The Real Projective Plane 203 


More precisely, because one could only see a two-dimensional image, one sees the 
oblique projection onto the horizontal plane 


{a y.2 eR | z=1}. 


And what about z = 0? Well, in this case, the projection must be radial: 


‘i 1 
Ci y)]—7-+—(, y,0). 
0 2 


ye 


In summary, the entire projection is defined by 


x 
Cily|j—- 
Z 


Next, let’s introduce a more uniform projection. 


6.4.3 Radial Projection 


Alternatively, one might want to use a more uniform approach: always use a radial 
projection, regardless of whether z = 0 or not: 


. i 
Ci y >a 5, Ys Z) 
& 


Vx ty? +z 


(Fig. 6.9). Let’s see what we obtain. 


Fig. 6.9 Radial projection: z 

each line of the form Cv 

projects onto the pair of 

antipodal unit vectors v 
+v/||v|| in the sphere S? 


< 
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6.4.4 The Divided Sphere 


Fortunately, we already have the decomposition 
C=UH. 


Now, each ray of the form Hv € V/H can also be represented by the unique vector 
v/||v|| in the unit sphere S?. Thus, 


V V/H 
vV/C=——= wee S778". 
UH U 
This is the divided sphere: the family of pairs of antipodal points in the original unit 
sphere. In this family, each pair of antipodal points is viewed as an individual object 
in its own right. 


6.4.5 Infinity Points 


Let’s consider once again the oblique projection in Sect. 6.4.2. It is not quite uniform: 
it distinguishes between ordinary “points” and infinity “points”, where a “point” 
means a complete line of the form Cu € V/C. 

What is an infinity point in the real projective plane? It is just a horizontal line of 
the form C(x, y, 0)‘, for some real numbers x and y that do not both vanish at the 
same time (Fig. 6.10). Unfortunately, the zero z-coordinate can no longer be used to 
divide, or normalize, or project to the horizontal plane 


{z= 1}={@, y, 1) | x,y € R} 


as in Fig. 6.8. 

Fortunately, the unit sphere S* is much more symmetric than the above plane. 
Therefore, the horizontal line C(x, y, 0)! can still project to S?, as in Fig. 6.10. In 
both the oblique and the radial projections, this makes a pair of antipodal infinity 
points of the form 


Fig. 6.10 What is an z 
infinity point in the real 

projective plane? It is a 

horizontal line of the form 

C(x, y, 0)’, projected onto 

the antipodal points 

+(x, y, 0)/./x2 + y? in the 


infinity circle 


< 


£ (a, y, 0) 
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4 & y, 9) 


Together, they make the infinity circle. 


6.4.6 The Infinity Circle 
Together, these infinity points make the infinity circle: 

{@,y,0) |x, yeER, x+y? =1}=S°N{(x, y,0) | x, ye R}XS'. 
This circle is just the equator in the original unit sphere. Fortunately, unlike in the 


complex projective plane in Sect. 6.2.7, itno longer shrinks into a single infinity point. 
On the contrary: it contains many useful infinity points, in all horizontal directions. 


6.4.7 Lines as Level Sets 


So far, we’ve used a vector of the form 
v= v2 eV 


to stand for a particular point in V. Fortunately, this is not the only option: v could 
also make a complete plane in V. 
Indeed, each real linear function 
f:V—->R 
could also be defined in terms of real inner product with a fixed vector v € V: 


f@.y2=fh,y,2 =, y, zu = xv, + yur + 203. 


The vector v is also called the gradient of f, denoted by 
Vf =| v2 | =v. 


Because /f, is linear, it must indeed have a constant gradient (Chap. 8, Sect. 8.9.2). 
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Now, let r be a fixed real number. What is the rth level set of f,? Well, it is the 
“origin” of r under f,: it contains those vectors (x, y, z)’ € V that f, maps tor: 


he=f'O={e.yo0' €V | f@.yod=r}. 


This notation has nothing to do with inverse. In fact, f, may have no inverse at all. 
After all, a level set may contain a few points (Chap.5, Sect. 5.4.1). 

In particular, what is the zero level set? Well, it contains those vectors that are 
orthogonal to v. Together, they make a complete plane, orthogonal to v: 


ho=f, 0) ={@,y,.2))€V | felt, y, 2) = @, yz) =O}. 


In a linear function as above, the gradient is a constant vector. In a more general 
function, on the other hand, the gradient may change from point to point, and the 
level set may be curved. Fortunately, at each point in it, the gradient is still normal 
(perpendicular) to it in a new sense: normal to the plane tangent to it. 

Note that, if some vector is in /,9, then so is every nonzero scalar multiple of it. 
In other words, /,.9 is invariant under C: 


Clyo = Ly.o- 


This is how things look like in V. In V/C, on the other hand, this makes a complete 
line. To see this, let the plane /, 9 cut the horizontal plane 


{z=l}={t, y,1) | x,y ER}. 


This produces a line: the “shadow” of 1,9 on this horizontal plane. 
Note also that, for every c € C, 


lev,0 = lyo- 


Thus, /,,9 could be defined not only by the original vector v € V but also by every 
nonzero scalar multiple of v. 

Thus, in V/C, 1,9 could be defined in terms of the entire line Cv € V/C rather 
than the concrete vector v € V. Geometrically, this (oblique) line is represented by a 
unique point: the point at which it meets the horizontal plane z = 1. This is the start 
of duality: in the real projective plane, a point is also a line, and a line is also a point. 
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6.5 Infinity Points and Line 


6.5.1 Infinity Points and Their Projection 


In particular, /, 9 contains the point 
(—v2, U1, 0), 


and every nonzero scalar multiple of it. As in Fig. 6.10, this point can project radially 
onto a pair of antipodal points in the infinity circle: 


1 
b= (—, 04, 0): 


~ foe 
vy + V3 


As discussed above, /,.9 also makes a line: its shadow on (or intersection with) the 
horizontal plane z = 1. Thus, in the real projective plane, the original plane /,.9 C V 
is interpreted geometrically as a new extended line: the shadow on the horizontal 
plane z = 1, plus its “endpoints:” the above pair of antipodal points on the infinity 
circle. 

By doing this for every v € V, the entire real projective plane is represented 
geometrically as a fan of infinite lines on the horizontal plane z = 1, each extended 
by a pair of “endpoints”. In summary, the entire real projective plane has projected 
onto the horizontal plane z = 1, surrounded by the infinity circle. This is in agreement 
with the original oblique projection in Sect. 6.4.2. 


6.5.2 Riemannian Geometry 


In the radial projection in Sect. 6.4.3, on the other hand, the entire zero-level set /, 9 
projects radially only, to cut the original unit sphere at a great circle, centered at the 
origin (0, 0, 0). Fortunately, in Riemannian geometry, such a circle is considered as 
a line. This way, lines are no longer linear in the usual sense, but rather circular. 
Furthermore, each point on a great circle coincides with its antipodal counterpart on 
the other side. 

This way, the divided sphere S?/S°S mirrors the horizontal plane z = 1. Each 
line on the latter can now extend to a complete plane that passes through the origin, 
and cuts the sphere at a great circle. By doing this for two lines that cross each other 
at a unique point, we obtain two great circle that meet each other at two antipodal 
points, considered as one. Moreover, by doing this for two parallel lines that “meet” 
each other at an infinity point, we obtain two great circles that meet each other at 
two antipodal points on the equator. 
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6.5.3 A Joint Infinity Point 


For example, consider the new vector 


UI 
v=lulev 
0 


that differs from the original vector v in the z-coordinate only: 


U;, # V3. 


Clearly, the oblique projections of the zero-level sets J, 9 and J, 9 make two parallel 
lines on the horizontal plane z = 1. Where do they “meet” each other? To find out, 
we must employ the radial projection, to obtain two great circles, which meet each 
other at two antipodal points on the equator: 


1 
- ——— (— vp, 0}, 0), 
u? + vs 


which are considered as one. 

Assume now that we’re given two lines in the real projective plane. Where do 
they meet? To find out, let’s introduce an easy algebraic method. This will show 
once again that the joint point is indeed unique. 


6.5.4 Two Lines Share a Unique Point 


Unlike in Euclidean or analytic geometry, here, in projective geometry, every two 
distinct lines meet each other at a unique point. This is true not only in Riemannian 
geometry, but also in the original oblique projection (Sect. 6.4.2). 

What is this joint point? To find out, consider two independent vectors v, v’ € V, 
which are not a scalar multiple of each other. The corresponding zero-level sets, /, 9 
and 1, 9, make two distinct planes in V, In the real projective plane V/C, on the 
other hand, they are considered as two distinct lines. After all, in the horizontal plane 
z = 1, they cut two lines: their shadow. 

What is their intersection in V? Fortunately, this is available in terms of vector 
product: 

lo Alyo = Cv xv’). 


After all, v x v’ is orthogonal to both v and v’ (Chap. 2, Sect. 2.2.4). 
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6.5.5 Parallel Lines Do Meet 


Let us study the z-coordinate in v x v’: 
(v x v')3 = V1Vs — V2V}. 


This is zero if and only if (vj, v5) is a scalar multiple of (v;, vz), as in the example 
in Sect. 6.5.3. In this case, /,,9 and J,y,9 cut not only the horizontal plane z = 1 (at 
two parallel lines) but also the infinity circle, at two infinity points: 


1 
+£————- (— 02, v;, 0). 
vi + vy 


Thus, two parallel lines on the horizontal plane z = 1 do meet each other at two 
antipodal infinity points, making one and the same point on the equator in the divided 
sphere. 

Thus, in both Riemannian geometry and the real projective plane, there are no 
parallel lines any more: every two distinct lines meet each other at a unique point. Is 
this true even when one of the lines is the infinity line? 


6.5.6 The Infinity Line 


How does the infinity line look like? Well, we’ve already seen that an infinity point 
has a zero z-coordinate. What vector is orthogonal to all such points? This is just the 
standard unit vector 


0 
e=]0]eYV. 
1 


Indeed, every infinity point must lie in the plane orthogonal to e—the horizontal 
plane z = 0: 


Lo= {yz eV | fx. yz) =(,y.2e=z=0} = {(@, y, 0) € VI. 


So, in the real projective plane V/C, the infinity line is just the horizontal plane 
z = 0. After all, once projected onto the unit sphere as in Fig.6.10, it makes the 
entire infinity circle. 

The infinity line also meets every other line at a unique infinity point. Indeed, 
every vector v € V that is not a scalar multiple of e must have a nonzero component 
v1 # Vor v2 € 0. Therefore, the zero-level sets of v and e intersect each other at the 
line 
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—v2 
lyo Nleg = Cv x e) =C V1] 
0 


Once projected on the unit sphere, this line makes the infinity point 


1 
b —————— (— 02, v1, 0). 


fo 2 
Vv Uy t+ U2 


This is indeed the unique joint point of the original lines /,.9 and /,.9 in the real 
projective plane. 

We’re now ready to see that, in projective geometry, there is a complete symmetry 
between points and lines: just as every two distinct lines meet each other at a unique 
point, every two distinct points make a unique line. This is proved algebraically: there 
is no longer any need to assume a specific axiom for this, as is done in Euclidean 
geometry. 


6.5.7 Duality: Two Points Make a Unique Line 


In projective geometry, a vector v € V may have two different interpretations: either 
as the point Cv, or as the line /, 9. Let’s use this duality to form a complete symmetry 
between points and lines. 

Indeed, as discussed in Sect. 6.5.4, the vector product v x v’ produces the unique 
joint point of the distinct lines /, 9 and /,y.9. Fortunately, this also works the other 
way around: once v and v’ are interpreted as the points Cv and Cv’ in V/C, their 
vector product makes the unique line that passes through both of them. 

Indeed, since both v and v’ are orthogonal to v x v’, both belong to the zero-level 
set of foxy’: 

v, v’ € Lyxv',0s 


or 
v 
Cv, Cv Chyyo- 


Thus, in projective geometry, vector product can be applied to two distinct objects of 
the same kind, to form a new object (of a new kind) that lies in both of them. If the 
original objects are interpreted as points, then the new object is the line that passes 
through them. If, on the other hand, the original objects are interpreted as lines, then 
the new object is the point they share. 
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6.6 Conics and Envelopes 


6.6.1 Conic as a Level Set 


So far, we’ve only studied the sets V and V/C. Now, let us also study the groups G 
and G/C that act upon them. 

A conic (ellipsoid, hyperboloid, or paraboloid) in V is defined by some symmetric 
matrix g € G [48]. For this purpose, consider the quadratic function 


de: V>R, 


defined by 
dev) = v'gv. 


Here, we assume that g is indefinite, so gg may return either positive or negative or 
zero value. 

For each real number r € R, the rth level set of q, (the origin of r under gz) is 
denoted by 


mg, =q, (r)={veEV | ge(v) =r}. 


This notation has nothing to do with inverse. In fact, gz may have no inverse at all. 
After all, the level set may contain a few vectors (Chap.5, Sect. 5.4.1). 
In particular, the zero-level set of gg is 


meo = 4,10) = {ve V | ge(v) = 0}. 


This zero-level set is called a conic in V. 


6.6.2 New Axis System 


As areal symmetric matrix, g has real eigenvalues and real orthonormal eigenvectors. 
(See Chap. 1, Sects. 1.9.4 and 1.10.4, and exercises therein.) These eigenvectors form 
a new (real) axis system in V, which may differ from the standard x-y-z system. In 
the new axis system, g is in its diagonal form, with its (real) eigenvalues on the main 
diagonal. This is indeed the axis system in which the original conic visualizes best. 

Thanks to the algebraic properties of the original matrix g, we have a rather good 
geometrical picture. Thanks to symmetry, we have the new axis system. Furthermore, 
thanks to indefiniteness, the eigenvalues are not of the same sign, so the zero-level 
set Mm, is nonempty. For this reason, in terms of the new axis system, the original 
conic must be a hyperboloid. 
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Fortunately, for every c € C, 
Mcg0 =m g.0- 


Thus, m,,9 can be defined not only by the original matrix g € G but also by the 
element Cg in the factor group G/C. 


6.6.3 The Projected Conic 


Clearly, if v € mg, then every nonzero scalar multiple of v is in mg as well: 
Cv C mg. For this reason, mg 9 is invariant under C: 


Cm = Mz,0- 


Thus, 7,9 can be interpreted not only as a conic in V, but also as a lower dimensional 
conic in V/C. Once projected on the horizontal plane z = 1 as in Fig. 6.8, the original 
conic indeed produces a one-dimensional conic: ellipse, hyperbola, or parabola. 

To make this more concrete, assume that a camera is placed at the origin (0, 0, 0), 
faced upwards. Through the camera, one could only see the upper part of the original 
conic, with z > 0. More precisely, one only sees a two-dimensional image: the 
horizontal plane z = 1, with the conic’s shadow in it: a curve of the form 


Xx Xx 
yJeVy}| @y, Del y |] =0 
1 1 


This is the projected one-dimensional conic in the horizontal plane z = 1. 


6.6.4 Ellipse, Hyperbola, or Parabola 


How does the projected conic look like? Well, this depends on the leading quadratic 
terms: x”, xy, and y? in the original function de. The coefficients of these terms can 
be found in the minor g°-*): the 2 x 2 upper left block in g (Chap. 2, Sect. 2.1.1). 

Like g, g°-» is areal symmetric matrix, with a diagonal form: its real eigenvectors 
make a new two-dimensional axis system, which may differ from the standard x-y 
system. In this new axis system, the projected conic may indeed visualize best. In 
fact, if g°->) has a positive determinant: 


det (g@?) > 0, 
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then its (real) eigenvalues must have the same sign, so the projected conic must be 
an ellipse. If, on the other hand, 


det (ig) <0, 


then the eigenvalues must have different signs, so the projected conic must be a hyper- 
bola (in terms of the new two-dimensional axis system). Finally, if the determinant 
vanishes: 

det (e°) = 0, 


then one of the eigenvalues must vanish, so the projected conic must be a parabola 
(in terms of the new axis system) in the horizontal plane z = 1. 


6.6.5 Tangent Planes 


Let v be some vector in the original conic mg. Let us apply g to v, to produce 
the new vector gu € V. As discussed in Sect. 6.4.7, gu defines the plane /,,, 9 that is 
orthogonal to gv. Furthermore, because /,,9 is invariant under C, it can be interpreted 
not only as a plane in V but also as a line in the real projective plane V/C. Moreover, 
it can project to a yet more concrete line: its shadow on the horizontal plane z = 1. 

Let us now return to the original conic mg C V. Fortunately, the plane /,, 9 is 
tangent to it at v. Indeed, since v € mg 0, v' gv = 0, sou € /,y,9 as well. Furthermore, 
both the conic and the plane have the same normal vector at v. After all, both are 
level sets of functions with proportional gradients at v: 


Vdg(v) = 2gv = 2V foy 
(Sect. 6.4.7, and Chap. 8, Sect. 8.9.2). Thus, the mapping 
v—> gu 
maps the original point v € mg. to a vector that is normal (or perpendicular, or 
orthogonal) to both the original conic and the tangent plane at v. 
Fortunately, once projected onto the horizontal plane z = 1, the tangent plane also 


produces the line (or shadow) that is tangent to the projected conic at the projected 
v. 


6.6.6 Envelope 


The new vector gu studied above has yet another attractive property: it belongs to 
the zero-level set of the quadratic function associated with the inverse matrix g7!: 


214 6 Projective Geometry with Applications in Computer Graphics 


dg-'(gv) = (gv)'g7!gu = (gv)'v = v'g'v = v'gu = qe(v) = 0, 


or 
§U E Mg 9. 


This can be written more compactly as 
§Mg.0 C Mg-1 9. 
Now, let’s substitute g~! for g: 
g'Mg-1.0 C Mg 0. 
By applying g to both sides, we have 
Mg-1,9 C BMg 0. 


In summary, 
8Mg = Mg! 9. 


In the dual interpretation in Sect. 6.5.7, the original tangent plane /,, 9 is viewed as a 
mere point: Cgu € m,-1 9. In this interpretation, the new conic m,-1 9 makes a new 
envelope: a family of planes, all tangent to the original conic. 


6.6.7 The Inverse Mapping 


What is the envelope of the new conic m,-1 9? To find out, we just need to use gl: 


-1 
& Mein =Mgo, 
which is just the original conic once again. 
More specifically, consider an individual vector of the form gu € mg-i9. The 
inverse mapping 
gu>glgu=v 


maps it to the vector v that is normal to the new conic at gu: 


Vde(gv) = 227! eu = 2 =2V/f,. 
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6.7 Duality: Conic—Envelope 


6.7.1 Conic and Its Envelope 


Thus, the original roles have interchanged: v is no longer interpreted as a mere point 
in the original conic, but rather as a complete plane: /, 9, tangent to the new conic at 
gu. The original tangent plane /,,,9, on the other hand, is now interpreted as a mere 
point: gv, in the new conic. This is a geometrical observation. Algebraically, it has 
already been written most compactly as 


-1 

8 Mg-1 0 => Mg 0- 
In summary, projective geometry supports two kinds of duality. In the elementary 
level, a line takes the role of a point, whereas a point is viewed as a complete line 


(Sect. 6.5.7). In the higher level, on the other hand, the original conic is viewed as an 
envelope, whereas the original envelope is viewed as a conic. 


6.7.2 Hyperboloid and Its Projection 


Consider, for example, the special case 


<i: 0 
g=| 0 -10 
001 


This way, the original conic is the hyperboloid 
meo={veV | v'guv=0}={@, ya eV | Pax ty}. 
In this simple example, 
or 
so the new conic is the same. 
Now, let’s pick some vector v in the conic, say 
v) -1 


1 E€ Mes 0 


U3 J/2 


= 
lil 
< 
N 
lil 
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Fig. 6.11 The hyperboloid projected Igx,0 
projects onto a circle in the 
horizontal plane z = 1. Each 
plane tangent to the original 
hyperboloid projects to a line 
tangent to the circle and 
perpendicular to its radius projected v 


projected lggv,o = lu,o 


(Fig. 6.11). The tangent plane at v is perpendicular to 
—v1 1 
gu=|—-v]=]{ -l 


v3 J/2 


More explicitly, the tangent plane at v is 


loo= {(x, y,zveV il x-y= zv?| : 
In particular, v itself belongs not only to the conic but also to this plane, as required. 
To have a better geometrical understanding, let’s project obliquely, to make a 
shadow on the horizontal plane z = |. The projected conic is the circle 


{ay DieV | x7+y?=1}. 


Furthermore, the projected /,,,9 is the line 


{@.y DieV | x-y=-v3}. 


As can be seen in the upper left part of Fig.6.11, this line is indeed tangent to the 
circle at the projected v: 


—1//2 


Vv 
ewe \ 


So far, gu has been used only to form the tangent plane /,,,9. Thanks to duality, gv can 
also be viewed as a mere point on the new conic, which is just the same hyperboloid: 


Mg-1 0 = Mz,0- 
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More explicitly, the vector 


projects onto the vector 


The tangent plane at gu, which is just 
leev.0 = ho = | y,2 EV | x-y = ev}, 
projects onto the tangent line 
{ay Diev | x-y=v3}, 


as in the lower right part of Fig. 6.11. 

Thanks to duality, /,,9 can also be interpreted as a mere point: the original vector 
vu € V. This is indeed duality: the tangent to the tangent is just the original vector 
itself, and the envelope of the envelope is just the original conic itself. 

There is nothing special about the above choice of v: every two points v and gu 
on the original hyperboloid project onto two antipodal points: 


Vi U2 
es 


Furthermore, /,, 9 and /, 9 project onto two parallel lines that share the same normal 


vector: 
& ) 
2 2 \ v2 
vy + vz 


(Sect. 6.5.5). For this reason, the projected /, 9 and the projected /,, 9 are both per- 
pendicular to the projected v and gv. This is indeed as expected from a circle in 
Euclidean geometry: the tangent should be perpendicular to the radius ({22] and 
Chap. 6 in [63]). 

In summary, duality is relevant not only in the original real projective plane but 
also in the horizontal plane z = 1: just as the projected /,,,9 is tangent to the circle at 
the projected v, the projected /, ¢ is tangent to the circle at the projected gv, on the 
other side. 
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6.7.3 Projective Mappings 


A projective mapping (or transformation) acts in the real projective plane V/C. This 
way, it can model a three-dimensional motion. Once projected onto the horizontal 
plane z = 1, the original three-dimensional trajectory produces a two-dimensional 
shadow, easy to illustrate and visualize geometrically. 
Let g € G bea real nonsingular 3 x 3 matrix. Clearly, g can be interpreted as a 
linear mapping: 
v—>gv, veR’. 


As discussed in Chap. 5, Sect.5.8.5, Cg € G/C can also act on the real projective 
plane V/C: 
Cv > Cg(Cv) =C(gv), Cue V/C. 
In other words, if v is a representative from the equivalence class Cv € V/C, and 
g is a representative from the equivalence class Cg € G/C, then gu may represent 
the new equivalence class Cg(Cv) € V/C. 
This is how the projective mapping looks like in V/C. How does it look like in 


the horizontal plane z = 1? Well, it can break into three stages: first unproject, then 
apply Cg, then project. Together, this makes PCg P~!: 


7 v/(gu if (gv 0 
v—> PCgP~'v = PCgCvu = PC(gv) = Eee if po oo 


Let’s look at a few examples. 


6.8 Applications in Computer Graphics 


6.8.1 Translation 


Translation is an important example of a projective mapping that is often used in 
computer graphics. Let a and 6 be some real parameters. Consider the matrix 


In the horizontal plane z = 1, in particular, g translates by (a, 6)": 


x x x+a 
yl>ese{ty}]=|y+B 
1 1 1 
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Thus, the horizontal plane z = | remains invariant. 

This kind of translation, however, is too simple and naive to simulate or visualize 
a real three-dimensional motion. In computer graphics, one might want to simulate 
the original motion well, before projecting to two dimensions. 


6.8.2 Motion in a Curved Trajectory 


For this purpose, consider a planar object in the original space V, with the unit normal 
vector n. Suppose that it moves along a given curve or trajectory t C V. How to 
simulate or visualize this motion best? 

The original three-dimensional motion can be approximated by a composition of 
many tiny linear translations, each advances the object by a small step in the direction 
tangent to t C V. Let’s focus on the first step: the next steps can model in the same 
way. 

Initially, the object is placed at the beginning of t. At this point, let p be the unit 
vector tangent (or parallel) to r. 


6.8.3 The Translation Matrix 


Define the translation matrix g by 
g=l+y-p-n, 


where y is a real parameter to be specified later, and n‘ is the row vector transpose 
ton [71]. 

Consider, for example, the simple case in which both the planar object and the 
tangent vector p lie in the horizontal plane z = 1: 


0 
n=[{0], 
1 
and 
a 
p=|86 
0 


In this case, the motion is actually two-dimensional: 
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a 10y-a 
g=I+y|6)0,0,1I)=|O0ly-B 
0 00 1 


With y = 1, this is just the same as in Sect.6.8.1. More general (genuinely three- 
dimensional) motion, on the other hand, requires a more general translation matrix. 


6.8.4 General Translation of a Planar Object 


What does a general translation do? Well, it translates the entire planar object in the 
direction pointed at by p. To see this, let r be the (real) inner product that n makes 
with the initial point in the trajectory. Initially, the planar object lies in its entirety in 
the plane (or level set, or shifted zero-level set) 


nr ={vEV | n'v=r}=r-n+lno. 
Then, each point v in the planar object translates by the same amount: 
v>gv=vt+y-p-nv=vt+y-r-p, 


as required. This completes the first step in the discrete path that approximates the 
original motion. 


6.8.5 Unavailable Tangent 


In practice, however, the tangent p is not always available. Fortunately, it can still 
be approximated by the difference between two given points on the trajectory—the 
next point minus this point: 

P= U2 — U4. 


For instance, using the parameter 


1 1 
nu; r 


we have a new translation matrix that never uses p: 
1 t 
g=l+-(u2—-—u))n. 
r 


This g indeed translates u; to u2: 
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1 t 
uy > guy =u t+ Ma — uy )M uy; =u) + (uz — U1) = Ud, 


as required. The object is then ready to advance from uw to the next point on the 
trajectory. 

In the next step, on the other hand, up-to-date values of u,, v2, 1, and y should be 
used, to design a new translation matrix g, and advance the object to the next point 
on the trajectory, and so on. 

Finally, to visualize the discrete path well, project it on the horizontal plane z = 1, 
as above. This may give acomplete animation movie of the original three-dimensional 
motion. 


6.8.6 Rotation 


How to visualize the motion of the moon in the solar system? It contains two inner 
rotations: the Moon around the Earth, and the Earth around the Sun. The latter may 
take place in the horizontal plane z = 1. For instance, the sun could be at (0, 0, 1), 
and the Earth could start from (1, 0, 1). In this case, the Earth’s motion is governed 
by the matrix 
cos(@) — sin(@) 0 
g(0) =| sin(@) cos(@) 0 


0 0) 1 
This way, the Earth’s orbit is 
1 cos(@) 
g(8) | 0 0<60<27} = sin(@) 0<6<2n ~ gsi. 
1 1 


The rotation of the moon around the Earth, on the other hand, is not necessarily 
confined to any horizontal plane. On the contrary, it may take place in an oblique 
plane, with the normal vector 


with some nonzero real components 11, n2, and n3. Let’s define two more (real) 
orthonormal vectors: 


m = —]| jn, and k=nxm. 
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This way, n, m, and k form a new axis system in R? (Chap. 2, Sects. 2.2.4 —2.3.2). 
Let’s use them as columns in the new 3 x 3 (real) orthogonal matrix 


O=(n | m|k). 


We are now ready to define the matrix that rotates the moon by angle @ in the oblique 
m-k plane: 
1 0O 0 
2(0) = O | Ocos(@) —sin(@) | O'. 
0 sin(@) cos(@) 


This way, if the moon is initially at m (relative to the Earth), then it will later be at 


1 O 0 
2(0)m = O | Ocos(@) —sin(@) | O'm 

0 sin(@) cos(@) 
1 O 0 0) 

= O | 0cos(@) — sin(@) 1 
0 sin(@) cos(@) 0 

0 

= O | cos(0) 
sin(@) 

= cos(9)m + sin(@)k 


(relative to the Earth). For this reason, if the Earth was at the origin (0, 0, 0), and the 
moon was initially at m, then the moon’s orbit would be 


{g(@)m | 0 <6 < 27} = {cos(6)m + sin(@)k | 0 <6 <2n}~S'. 


The Earth, however, is not static, but dynamic: it orbits the Sun at the same time. 
Therefore, the true route of the moon in the solar system is the sum of these two 
routes: the Earth around the Sun, plus the moon around the Earth, at a frequency 12 
times as high: 
1 
g(6)} 0 | + 2(126)m | 0 <6 < 2x 
1 


To visualize, let’s use the discrete angles 
0<0, <0 <---<6y =27, 


for some large natural number NV. These N distinct angles produce the discrete path 
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Fig. 6.12 The route of the moon in the solar system, projected on the horizontal plane z = 1. 
It is assumed that the moon rotates around the Earth in an oblique plane, whose normal vector is 


n= (1,1, 1! /V3 


1 
g(6;)| 0) +¢026)m|1<i<N 
1 


This discrete path can now project onto the horizontal plane z = 1, by just dividing 
each vector by its third component. This may produce a two-dimensional animation 
movie to visualize the original three-dimensional motion of the moon in the solar 
system (Fig. 6.12). 


6.8.7 Relation to the Complex Projective Plane 


How does the real projective plane relate to the complex projective plane in 
Sect.6.2.1? Well, recall that the latter was first reduced to a hemisphere. For this 
purpose, we divided by arg(c2) (Sect.6.2.7). At the equator at the bottom of the 
hemisphere, however, it is impossible to divide by cz = 0. Instead, we must divide 
by arg(c,), shrinking the entire equator into just one infinity point, thus losing a lot 
of valuable information about the original direction of each individual infinity point 
on this equator. 
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Fortunately, the real projective plane improves on this. The equator no longer 
shrinks, so the original hemisphere no longer reduces to a standard sphere. This way, 
each pair of antipodal infinity points still point in the original direction, storing this 
valuable information for future use. 


6.9 The Real Projective Space 


6.9.1 The Real Projective Space 


Let us now go ahead to a yet higher dimension: redefine V as 
V =R°\ {(0, 0,0, 0)}. 


Furthermore, redefine G as the group of 4 x 4 real nonsingular matrices. In this 
group, the unit element is the 4 x 4 identity matrix 


1000 
0100 
0010 
0001 


l= 


This new J is now used to redefine the subgroups C, U, and H: 


C={xI | x eR, x £0} =(R\ {O})/ 
U ={xI | x ER, |x] =1}=4/=S°1 
H ={xI | x ER, x > O}. 


We are now ready to project. 


6.9.2 Oblique Projection 


For each vector 


In V, Cv is acomplete equivalence class: a three-dimensional hyperplane in V. Still, 
it also has another face: an individual element (or “point’’) in the real projective space 
V/C. Let’s go ahead and project it “obliquely”. 
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If v4 A 0, then Cv could indeed project on the hyperplane 
{a y.z,DEeV}CV 


simply by dividing by va: 


If, on the other hand, v4 = 0, then v represents an infinity point that must project 
radially by 


1 
v > +—. 
I|vl| 


This completes our “oblique” projection. 


6.9.3 Radial Projection 
Alternatively, one could also use a more uniform approach: always project radially, 
regardless of whether v; vanishes or not: 


1 
-——v. 
Ilv Il 


Fortunately, in S?/S°, these antipodal points are considered as one and the same. 
As in Sect. 6.4.1, this new projection could also be written algebraically as 


VV V/H 
V/C=—~= ue $?/5*, 
UH U 


where “~” stands for topological homeomorphism. 
6.10 Duality: Point—Plane 


6.10.1 Points and Planes 


As discussed above, the point v € V represents the point Cv in the real projective 
space V/C. Furthermore, as in Sect. 6.5.7, v also has a dual interpretation: the three- 
dimensional hyperplane orthogonal to v: 
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{(x,y,z,w) © V | xvy + yv2 + 203 + wg = O}. 


Fortunately, this hyperplane is invariant under C. Therefore, it might get rid of one 
redundant degree of freedom, and be viewed as a two-dimensional plane in V/C. 

In summary, in the real projective space, Cu may take two possible meanings: 
either the point Cv, or the plane orthogonal to v. 

We’ve already seen duality in the context of the real projective plane: we’ve used 
vector product (defined in Chap. 2, Sect. 2.2.3) to show that two distinct lines meet 
at a unique point, and two distinct points make a unique line (Sects. 6.5.4 and 6.5.7). 

To extend this, we must first extend vector product to four spatial dimensions as 
well. This will help establish duality in the real projective space as well: every three 
independent points make a unique plane, and every three independent planes meet 
at a unique point. 

In a more uniform language, in V/C, three independent objects of one kind make 
a unique object of another kind. Thus, the three original objects could be interpreted 
in terms of either kind: there is a complete symmetry between both kinds. 


6.10.2. The Extended Vector Product 


To define the required vector product in four spatial dimensions as well, consider a 
row made of four column vectors—the standard unit vectors in V: 


by 
lil 


0 0 
0 0 
1]°] 0 
0 1 


eo oO = © 


1 
0 
0]? 
0 


This row will serve as the first row in the 4 x 4 matrix used in the vector product. 
The so-called triple vector product can now be defined as a vector function of the 
form 
oe (R*)° > R’. 


What does it do? Well, it takes the column vectors u, v, w € R*, and places them as 
rows in anew 4 x 4 matrix (whose first row is E). Finally, it returns the determinant: 


Sm 


xX (u, Vv, wW) = det 


& 


This is a new vector in R*, as required. Indeed, thanks to the original definition of a 
determinant (Chap. 2, Sect. 2.1.1), this is just a linear combination of the items in the 
first row—the standard unit vectors in R*. Let’s use it in the real projective space. 
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6.10.3 Three Points Make a Unique Plane 


Fortunately, a matrix with two identical rows has a zero determinant (Chap. 2, 
Sect. 2.2.4). This is why the new triple vector product is so attractive: it produces a 
new vector, orthogonal to u, v, and w: 


ae 


(x(u, v, w), uv) = det 


ee Gees 


(x(u, v, w), v) = det 


=a 


& 


ar 


(x(u, v, w), w) = det 


es 


€ 


For this reason, if u, v, and w are linearly independent vectors that represent three 

independent points in V/C, then their triple vector product represents the required 

“plane” in V/C: the unique plane that contains all three points—Cu, Cv, and Cw: 
Cu,Cv,Cwe{CveV/C | (x(tu,v, w),v) = 0} C V/C. 


Let us now look at things the other way around. 


6.10.4 Three Planes Share a Unique Point 


In the dual interpretation, on the other hand, uv is no longer a mere vector in V, but 
rather a complete hyperplane in V: the hyperplane orthogonal to uw. Likewise, v is 
now viewed as the hyperplane orthogonal to v, and w is now viewed as the hyperplane 
orthogonal to w. What point do they share? This is just 
X(u, v, Ww). 
After all, in V, this point belongs to all three hyperplanes. Therefore, in V/C, 
C (xu, v, w)) 


is indeed the unique point shared by all three planes, as required. 
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Conics can now be defined in the spirit in Sect. 6.6.1. Tangent hyperplanes can 


also be defined in the spirit in Sect.6.6.5. The details are left as an exercise. 


6.11 Exercises 


io) 


20. 


21. 


. InSect. 6.2.8, show that C/#H is a legitimate subgroup of G/H. Hint: see Chap. 5, 


Sect. 5.10.2. 


. Furthermore, show that C/H is normal. Hint: see Chap.5, Sect.5.10.2. 
. In Sect. 6.2.8, what is the center of G/H? 
. Show that this center must include C/H. Hint: each element Hc € C/H com- 


mutes with every element Hg € G/H: 


HcHg = H(cg) = H(gc) = HgHc. 


. Show that C/H must include the center of G/#. Hint: in the exercises at the end 


of Chap. 5, assume that the matrices A and B commute up to a scalar multiple. 
Fortunately, this scalar must be |. After all, when either A or B is diagonal, both 
AB and BA have the same diagonal. 


. Conclude that C/#H is indeed the center of G/H. 
. How can the formula U ~ C/H (end of Sect. 6.2.8) be deduced from the second 


isomorphism theorem in Chap. 5, Sect.5.10.1? Hint: assume that T and S' have 
just one joint element: the unit element. Then, substitute T < U and S < H. 


. Show that the set of real nonsingular 3 x 3 matrices is indeed a group. 
. Show that 7, the identity matrix of order 3, is indeed the unit element in this 


group. 


. Show that the center of this group is the set of real nonzero scalar multiples of 


I. Hint: see exercises at the end of Chap. 5. 


. Show that U, defined in Sect. 6.4.1, is indeed a group in its own right. 

. Conclude that U is indeed a subgroup of the above center. 

. Show that H, defined in Sect. 6.4.1, is indeed a group in its own right. 

. Conclude that H is indeed a subgroup of the above center. 

. Conclude that U H is indeed a group in its own right. 

. Conclude that UH is a subgroup of the above center. 

. Show that U H is exactly the same as the above center. 

. Show that a “point” in the real projective plane could be viewed an object in R?: 


areal nonzero three-dimensional vector, defined up to a nonzero scalar multiple. 


. Show that a “line” in the real projective plane could be interpreted in terms of 


its normal vector, which is a real nonzero three-dimensional vector. This way, 
the “line” is still in R*: it contains those three-dimensional vectors orthogonal 
to that normal vector. 

Show that multiplying that normal vector by a nonzero real scalar would still 
produce the same “line” as above. 

Show that the above “line” is invariant under nonzero scalar multiplication. 
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22. 
23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 
33. 


34. 
35. 


36. 


37. 


38. 


39. 
40. 


Conclude that the above “line” is invariant under the above center. 

Conclude that the above “line” is indeed a legitimate line in the real projective 
plane. 

Conclude that the line can be rightly called a projective line. 

Show that, in the real projective plane, two distinct points make a unique line. 
Hint: use the vector product of the original three-dimensional vectors as a normal 
to the required line. 

Show that, in the real projective plane, two distinct lines share a unique point 
(possibly an infinity point). Hint: take the vector product of the original normal 
vectors. 

Give an algebraic condition to guarantee that two such projective lines are “par- 
allel” to each other, or meet each other at a unique infinity point on the infinity 
circle. Hint: the original normal vectors must have a vector product with a van- 
ishing z-coordinate. To guarantee this, from each normal vector, drop the third 
component. The resulting two-dimensional subvectors should be proportional to 
each other. 

Show that the infinity line meets every other projective line at a unique infinity 
point on the infinity circle. Hint: its normal vector is (0, 0, 1)‘, so the above 
condition indeed holds. 

Show that a “point” in the real projective space could be viewed as an object 
in R*: a real nonzero four-dimensional vector, defined up to a nonzero scalar 
multiple. 

Show that a “plane” in the real projective space could be interpreted as an object in 
IR*, in terms of its four-dimensional normal vector. This way, the “plane” contains 
those four-dimensional vectors that are orthogonal to that normal vector. 

Show that multiplying that normal vector by a nonzero real scalar still produces 
the same “plane” as above. 

Show that the above “plane” is invariant under nonzero scalar multiplication. 
Conclude that the above “plane” is indeed a legitimate plane in the real projective 
space. 

Conclude that the above plane can be rightly called a projective plane. 

Show that, in the real projective space, three independent points make a unique 
plane. Hint: use the triple vector product (Sect. 6.10.2) of the original linearly 
independent four-dimensional vectors, to produce the required four-dimensional 
normal vector. 

Show that, in the real projective space, three independent planes share a unique 
point. Hint: take the triple vector product of the original linearly independent 
four-dimensional normal vectors. The resulting four-dimensional vector should 
be interpreted up to a nonzero scalar multiple. 

Extend the definition of conics in Sect.6.6.1 to the real projective space in 
Sect. 6.9.1 as well. 

Define tangent planes in the real projective space, analogous to tangent lines in 
Sect. 6.6.5. 

Forn = 1, 2,3,..., define the 2n-dimensional complex projective space. 
Show that it is topologically homeomorphic to S2”"*+!/S!, 
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41. 


42. 


43. 


44. 


45. 
46. 


47. 


48. 
49. 
50. 
51. 
52. 
53. 
54. 
35. 
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Show that this is just the top half of S?”, with a rather strange “equator” at the 
bottom: not S*”~! but rather S?”~! / S'— the infinity hyperplane, which is a lower 
dimensional complex projective space in its own right, defined inductively. 
Extend the duality established in Sects. 6.5.4—6.5.7 to the complex projective 
space as well. (Be sure to use the complex conjugate in your new kind of vector 
products.) 

Produce an animation movie of a planar object traveling along a curved trajectory 
in the three-dimensional Cartesian space (Sect. 6.8.1). 

Produce an animation movie of the Earth and the Moon, traveling in the solar 
system (Sect. 6.8.6). 

Show that the set of real nonsingular 4 x 4 matrices is indeed a group. 

Show that 7, the identity matrix of order 4, is indeed the unit element in this 
group. 

Show that the center of this group is the set of real nonzero scalar multiples of 
I. Hint: see exercises at the end of Chap. 5. 

Show that U, defined in Sect. 6.9.1, is indeed a group in its own right. 
Conclude that U is indeed a subgroup of this center. 

Show that H, defined in Sect. 6.9.1, is indeed a group in its own right. 
Conclude that H is indeed a subgroup of this center. 

Conclude that U H is indeed a group in its own right. 

Conclude that U H is a subgroup of the above center. 

Show that U H is exactly the same as the above center. 

Show that, in the method in Chap. 4, Sect. 4.4.3, the inverse Lorentz transforma- 
tion back to the x-y-t self system of the second particle is actually interpreted as 
a projective mapping in the real projective plane (Sects. 6.4.1 and 6.7.3). In this 
mapping, the original velocity (dx’/dt', dy'/dt’) of the first particle in the lab 
(Fig. 4.6) transforms to the new velocity (dx /dt, dy/dt) of the first particle away 
from the second one (Fig.4.7). Because we divide by ¢ or t’, the time variable 
is eliminated, and is only used implicitly to advance the particle in the direction 
pointed at by the velocity vector. 


Chapter 7 ®) 
Quantum Mechanics: Algebraic Point crest 
of View 


The matrices introduced above have two algebraic operations. Thanks to addition, 
they make a new linear space. Thanks to multiplication, they also form a group. 

In this group, the commutative law doesn’t hold anymore. Indeed, multiplying 
from the left is not the same as multiplying from the right. How different could these 
operations be? To measure this, we need a new algebraic operation: the commutator. 

Thanks to the commutator, we can now introduce yet another important field: 
quantum mechanics. Indeed, thanks to the above algebraic tools, this can be done in 
a straightforward and transparent way. For this purpose, we redefine momentum and 
energy in their stochastic (probabilistic) face. 

In Chap. 2, we’ve already introduced angular momentum in classical mechanics. 
In the exercises below, on the other hand, we use quantum mechanics to redefine 
angular momentum, and highlight it from an algebraic point of view. This may help 
study a few elementary particles like electrons and photons, with their new property: 
spin. This property is not well understood physically. Fortunately, thanks to groups 
and matrices, it can still be modeled and understood mathematically. 


7.1 Nondeterminism 


7.1.1 Relative Observation 


In Newtonian mechanics, we often consider a particle, or just any physical object. 
Such an object must lie somewhere in space: this is its position. 

At each individual time, the object may have a different position. This is deter- 
ministic: we can measure the position, and tell it for sure. This way, we can also 
calculate how fast it changes: the velocity. From the velocity, we can then calculate 
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the momentum and the kinetic energy at each individual time. This gives a complete 
picture of the object, and its physical motion. 

In Chap. 4, on the other hand, we’ve seen that things are not so absolute, but more 
relative. The position that I measure in my lab may differ from the position that you 
measure in your own system. In fact, position is meaningless on its own, unless it 
has some fixed reference point: the origin. What is meaningful is just the difference 
between two positions. 

The same is true for another important quantity: time. In fact, time is only relative 
and nonphysical. Indeed, the time that I see in my clock may differ from the time 
that you see in your own clock, particularly if you get away from me fast. 

Don’t worry: no clock is bugged—both work well. Time is not absolute, but only 
relative: it depends on the perspective from which it is measured. This is why time 
is never defined absolutely. 

Even the universe has no absolute beginning. Indeed, you can never time travel 
back to the big bang, but only approach it as a limit. Indeed, the big bang itself is 
singular: the universe was so dense that time was so slow and heavy that it hardly 
moved at all! 

Thus, in special relativity, time is just an observation. To know it, one must look 
and observe. This measurement is relative—it depends on the perspective from which 
it is made. For this reason, two observers may see a different time in their different 
systems. For example, the time that I see in my own clock here on the Earth may 
differ from the time in some other clock, placed on a satellite. 


7.1.2 Determinism 


So, special relativity teaches us to be more modest, and not trust our own eyes. What 
you see in your system is not necessarily the absolute truth: it is often different from 
what I see in my own (moving) system. This applies not only to time but also to other 
important observations. Just like there is no absolute time, there is also no absolute 
position. What exists physically is just the original object itself. 

Together, time and position make a new pair of observations, which depend on 
the system where they are measured. Likewise, momentum and energy make a new 
pair as well. This is why momentum is more fundamental than velocity. 

Fortunately, this is still deterministic. You know what you see, with no doubt. 
This way, the physical quantities are still well defined, uniquely and unambigu- 
ously. Unfortunately, this is true only in the macroscale, but not in the microscale or 
nanoscale, used often in molecules, atoms, and subatomic particles. 
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7.1.3 Nondeterminism: Observables 


In quantum mechanics, things get yet worse: nothing is certain any more. After all, 
a particle could be so small that its position has no physical meaning whatsoever. 
In this context, the position is no longer an observation, but just an observable: you 
could observe and measure it, but better not. 

Instead of a physical position, the particle may only have a probability to be 
somewhere: a number that tells us how likely it is to be there. Perhaps it is there, and 
perhaps not. We’ll never know, unless we’re ready to take the risk and change our 
physical state forever. 

Indeed, to know the position for sure, you must take a measurement. But this is 
not advisable: the particle is often so small that detecting its position is too hard, and 
requires a complicated experiment, which may change the entire physical state. This 
way, we may lose a lot of information about other important observables, such as 
momentum. 

This also works the other way around: the particle is so small that measuring its 
momentum is too hard, and may require a complicated experiment. As a result, vital 
information is gone, and the original position may never be discovered any more! 

Fortunately, thanks to our advanced algebraic tools, we can now model even a 
highly nondeterministic state like this. Indeed, to model an observable, we can now 
use a matrix. After all, matrices enjoy all sorts of useful algebraic properties. 

This way, there is no need to look or observe as yet: this could wait until later. 
In the meantime, we can still “play” with our matrices, and design more and more 
observables as well. 


7.2 State—Wave Function 


7.2.1 Physical State 


In Newtonian mechanics, we often consider a particle, traveling along the x-axis. At 
time f, its position is x(t). By differentiating, one could also calculate the velocity 
x'(t), and the momentum mx’ (t) (where m is the mass). This is the linear momentum 
in the x-spatial dimension. This could be done at each individual time f. 

In quantum mechanics, on the other hand, this is not so easy any more. Indeed, 
there is no determinism any more. In the “true” physics, the particle is nowhere (or 
everywhere...). The physical state only tells us where it could be. Perhaps it is there, 
and perhaps not... 

The physical state is no longer a function x(t), but a nonzero n-dimensional 
(complex) vector v. This vector contains every information that nature tells us about 
the particle, including where it might be at time ¢. For this reason, v isn’t fixed, but 
may change in time: v = v(t). 
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7.2.2 The Position Matrix 


So, where might the particle be? For this purpose, we have a new n x n diagonal 
matrix: X. On its main diagonal, you can find possible positions that the particle 
might take. 

Consider, for example, some element on the main diagonal: X; 4 (for some | < 
k <n). How likely is the particle to be at position x = X;,,? Well, the probability 
for this can be deduced from v: it is just |v,|?. 

Clearly, the probabilities must sum to 1. For this purpose, we must assume that 
v has already been normalized to have norm 1. So, what is important is only the 
direction of v, not its norm. For every practical purpose, one may assume that v is 
defined up to a scalar multiple only: 


v € (C” \ {0}) / (C \ {0}) 


(Chap. 5, Sects. 5.8.3-5.8.5). 


7.2.3 Dynamics: Schrodinger Picture 


Unfortunately, n may be too small. After all, the particle could get farther and farther 
away from the origin, and reach infinitely many positions. To allow this, X must be 
an infinite matrix, with an infinite order. 

For example, a particle could “jump” from number to number along the real axis. 
To model this, X must be as big as 


This way, on its main diagonal, X has all possible positions: all integer numbers. In 
a yet more realistic model, on the other hand, X should be even bigger: on its main 
diagonal, it should contain not only integer but all real numbers. In this case, X is 
not just a matrix, but actually an operator. 

This kind of dynamics is called Schrodinger’s picture: X remains constant at 
all times, whereas v changes from time to time. This setting is more common than 
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Heisenberg’s picture, which works the other way around: v remains constant, whereas 
X changes in time. 

For simplicity, however, we try to avoid infinite dimension. Instead, we mostly 
stick to our finite dimension n, and our original n-dimensional vector and n x n 
matrix. 


7.2.4 Wave Function and Phase 


How does the state v look like? Well, it may look like a discrete sine or cosine wave, 
as in Figs. 1.10 or 1.11. This is why v is often called a wave function. 

Still, v is not necessarily real: it may well be a complex vector in C”. For example, 
it may be the discrete Fourier mode, as in Fig. 1.12. 

Thus, in general, each component v; is a complex number in its own right. As 
such, it has its own polar decomposition—amplitude times exponent: 


Ug = |vx| exp Ox) , 


where i = \/—1is the imaginary number, and 6; is the phase: the angle that v; makes 
with the positive part of the real axis. 

The amplitude |v;| tells us how likely the particle is to be at position X;4. More 
precisely, the probability for this is |v,|?. The exponent, on the other hand, is a 
complex number of magnitudel. As such, it has no effect on this probability. Still, it 
may encapsulate important information about other physical properties. 

Once each component has been written in its polar decomposition, v takes a famil- 
iar face: wave function. This way, the original particle also has a new mathematical 
face: wave. As such, it enjoys an interesting physical phenomenon: interference. 


7.2.5 Superposition and Interference 


Two electrons cannot be in exactly the same state at the same time. This is Pauli’s 
exclusion principle. (See exercises below.) Two photons, on the other hand, can. In 
this case, they have the same wave function. 

Two wave functions may sum up, and produce a new wave function: their super- 
position. In this process, the original states sum up, component by component. To 
add two corresponding components to each other, their phases are most important. 
If they match, then they enhance each other. If, on the other hand, they don’t match, 
then they may even cancel (or annihilate) each other. This is the phenomenon of 
interference. 

Once the wave functions have been summed up and normalized, we have the 
new (joint) state: the superposition of the two original states. In summary, a particle 
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actually has two mathematical faces: on one hand, it is a particle. On the other 
hand, it is also a wave. Each face is useful to analyze and explain different physical 
phenomena. 


7.3 Observables Don’t Commute! 


7.3.1 Don’t Look! 


Let’s return to our original particle. It only has a nondeterministic position, where it 
might be. But where is it located in fact? 

Don’t ask! Because, to find out, you must carry out an experiment. At probability 
ye ’, you’d then discover a position x = X; 4 (for some | < k <n). 

What happens mathematically? Well, we now know for sure that x = X;x. In 
other words, the probability for this is now as large as 1: there is no doubt at all. So, 
our v has changed forever. After all, we now know that |v, |? = 1. Since v has norm 
1, all other components must now vanish. Indeed, the particle can no longer lie at 
any other position but X;.x. 

In summary, in your experiment, you spoiled the original v completely, with all 
the valuable information that was in it! Instead of the original interesting v, you now 
have a boring deterministic v: a standard unit vector. 

Here one may ask: why do we still need v? After all, we already got what we 
wanted: we discovered the true position! Still, v contained information not only about 
the position but also about many other physical properties. 

To appreciate better the information we may lose, let’s study our original v once 
again. The probability that x = Xxx is 


lve? = |(e, ») |’, 


where e is a standard unit vector: an eigenvector of X, with eigenvalue X;,x. Still, 
position is not the only observable we’re interested in. We might also want to know 
the momentum p of the particle, in the x-spatial direction. 


7.3.2 The Momentum Matrix and Its Eigenvalues 


To have the momentum, we are given yet another n x n Hermitian matrix: P. Again, 
the physical state doesn’t tell us the momentum for sure: it just tells us how likely 
the particle is to have a certain momentum. For example, let A, be an eigenvalue of 
P. How likely is the momentum to be p = ,? The probability for this is 


ue.» 
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Fig. 7.1 The position-momentum grid. At each individual time f, solve for the n-dimensional state 
v. This way, v contains the entire physical information at time f, in terms of probability. To be at 
x = X;j,;, the particle has probability |v; |?. To have momentum p = x, the particle has probability 
lu, wP As jk san) 


where u is the (normalized) eigenvector of P, corresponding to A,. 

Together, X and P give us all possible position-momentum pairs (X ;,;, Ax) (1 < 
j,k <n). This makes a new two-dimensional grid of such possible pairs (Fig. 7.1). 

The above grid mirrors the phase plane in classical mechanics, which contains all 
possible position—momentum pairs. Ideally, the grid should have been infinite and 
continuous: a two-dimensional Cartesian plane. After all, in reality, both position 
and momentum could take any value, not just n discrete values. To model this, n 
should be infinite. In this case, an inner product should be interpreted as an infinite 
sum, or even an integral. For simplicity, however, we stick to our finite dimension n, 
and our original n-dimensional vectors and n x n matrices. 


7.3.3 Order Matters! 


Although you might be curious to know the exact position of the particle, better 
restrain yourself, and not look! Because, if you looked, then you’d spoil v forever, 
and lose the valuable probabilities of the form |(u“), v)|* that gave you an idea about 
what the momentum was. 

This also works the other way around. Better not measure the momentum, because 
then you’d damage v, and lose the valuable probabilities |v, |? about the original 
position. 

This means that order matters: measuring x and then measuring p may give 
different results from measuring p and then measuring x. In other words, applying 
X and then applying P is not the same as applying P and then applying X: they 
don’t commute with each other. 
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7.3.4 Commutator 


So, it was indeed a good idea to use matrices to model physical observables. After 
all, matrices often don’t commute with each other. In our case, X and P indeed have 
a nonzero commutator: 

[X, P] =xXP-— PX 4(0), 


where ‘(0)’ stands for the zero n x n matrix. 
Recall that X and P are both Hermitian. As a result, [X, P] is an anti-Hermitian 
matrix: 


[X, PJ’ = (XP — PX)" = P"Xx"— x" p" = PX —XP =[P, X]=-IX, Pl. 


This will be useful later. 


7.3.5 Planck Constant 


To have this commutator in its explicit form, we have a new law of nature: 
[X, P] =ihl, 


where i = ./—1 is the imaginary number, / is the n x n identity matrix, and h is 
called Planck constant: a universal constant, positive, and very small. 

Why is this law plausible? Well, the error due to measuring in two different orders 
mustn’t depend on the original particle. After all, we might have the same error even 
if there was no particle at all. In fact, even with no particle at all, we might still 
measure a nonzero momentum or position. (This is called the ground state.) 

Thus, [X, P] should better be a constant matrix, independent of the particle under 
consideration. This is why it must be of the form i//: thanks to the imaginary number 
i, itis indeed anti-Hermitian, as required. Furthermore, thanks to the small constant 
h, it is very small, and has an effect only in microscale or nanoscale, used in quantum 
mechanics. In macroscale, on the other hand, it has no practical effect whatsoever. 
This is why it was ignored in both geometrical mechanics (Chap.2) and special 
relativity (Chap. 4). 
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7.4 Observable and Its Expectation 


7.4.1 Observable or Measurable 


The matrices introduced above are called observables (or measurables, or experi- 
ments). After all, they let us observe. For example, by applying X to v, we get some 
idea about the nondeterministic position: its expectation at state v. 

The actual observation, on the other hand, should better wait until later. After all, 
it requires an experiment, which may spoil the original state, with all the valuable 
probabilities that could have been deduced from it about other observables, such as 
momentum. In the meantime, we can still “play” with our matrices, and apply all 
sorts of algebraic operations to them. 

Consider an n x n matrix A, not necessarily Hermitian. Let’s write it as the sum 
of two matrices: 


_ A+A" A— A" 
= i 


A 


The former term is called the Hermitian part of A. It is indeed Hermitian: 


A+A\" AME A A+AM 
2 a 2 


The latter term, on the other hand, is called the anti-Hermitian part of A. It is indeed 
anti-Hermitian: 


(A5#)-#o A- at 


2 oe 2 


7.4.2 Symmetrization 


A proper observable should better be Hermitian. This way, its eigenvectors are orthog- 
onal to each other (and could be used to decompose any other vector), and its eigen- 
values are real (Chap. 1, Sects. 1.9.4—1.9.5). This is indeed a good property: after all, 
the eigenvalues stand for possible observed values, which are always real, with no 
imaginary part. 

Fortunately, the anti-Hermitian part of A is often as small as h, and can be disre- 
garded. This is called symmetrization: replacing A by its Hermitian part. Fortunately, 
this could wait until later. In the meantime, we can still stick to our original A, Her- 
mitian or not. 
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7.4.3 Observation 


So far, we’ve seen two observables: the position X, and the momentum P. In the 
context of special relativity, the time ¢ could be viewed as an observable as well 
(Chap. 4). After all, to have the time, you must observe: either look at your own 
clock, to see the proper time, or at least look at someone else’s clock, to see a new 
observable: a different time (Figs. 4.34.5). 

Of course, in special relativity, the scale is so large that randomness has no effect. 
In every practical sense, the observables commute with each other, so everything is 
deterministic. 

In very small scales, on the other hand, randomness can no longer be ignored. 
On the contrary: the “true” physical state is no longer deterministic. It only gives us 
the probability to observe something, not the actual observation. This is the “true” 
nature: just probability. Of course, we humans will never get to see this “truth.” After 
all, we must make a decision: what to measure first, and what to measure later. Each 
choice may give us different results: the true original nature remains a mystery. 


7.4.4 Random Variable and Its Expectation 


Thus, an observable is more mathematical than physical. It makes a random variable: 
we can’t tell for sure what its value is. Fortunately, we can still tell what its value 
might be. For example, its value could be A: some eigenvalue of the observable. The 
probability for this depends on the physical state v: it is |(u, v)|*, where u is the 
corresponding (normalized) eigenvector. 

An observable like X or P must be Hermitian. A more general random variable, 
on the other hand, may be represented by a more general matrix A, not necessarily 
Hermitian. What is its expectation (or average) at state v? To find out, just apply A 
to v, and take the inner product with v. This gives a new complex number: 


A+ At A— At 
(v, Av) = co + tae ; 


In this sum, the former term is real, and the latter term is imaginary. Thus, in terms 
of absolute value, each term is smaller than (or equal to) the entire sum: 


(Gis Aw A+ A? as A— At _ A— At 
7] = — U5, —=- U. . 
’ v v, 2 U i] 2 ee v, 2 


This gives us a useful lower bound for the expectation. 
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7.5  Heisenberg’s Uncertainty Principle 


7.5.1 Variance 


The original random variable might take all sorts of possible values. How likely are 
they to spread out, and differ from the average? To get some idea about this, we 
define the variance at state v: 


(A — (v, Av) 1) v7. 


Let’s estimate the variances of our original random variables X and P. Fortunately, 
their variances have a lower bound: their covariance. 


7.5.2 Covariance 


At state v, the expectation of X is (v, Xv), and the expectation of P is (v, Pv). Since 
these matrices are Hermitian, these expectations are real. 

Now, at state v, consider the product of the variances of X and P. How to estimate 
this from below? Well, thanks to the Cauchy—Schwarz inequality (exercises at the 
end of Chap. 1), 


|(X — (v, Xv) F) v]] - ||CP — (uv, Pv) LT) vl] = |X — (vy, Xv) 1) v, (P — (v, Pv) I) v)|. 
What do we have on the right-hand side? This is the covariance of X and P at state 
v. To estimate it from below, recall that, although X and P are both Hermitian, their 


product X P is not. Fortunately, we still have the estimate in Sect. 7.4.4: 


|(X — (vy, Xv) 1) vl] - ||P — (vy, Pv) 1) vl = |X — (v, Xv) 1) v, (P — (v, Pv) I) v)| 
= |(v, (X — (v, Xv) 1) (P — (v, Pv) I) v)| 


IV 


; |(v, [X — (v, Xv) I, P — (v, Pv) I] v)| 


NS NPE ele 


5 I(v, [X, P| v)| 


|(v, ih v)| 


(assuming that v has already been normalized in advance). 
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7.5.3 Heisenberg’s Uncertainty Principle 


Finally, take the square of the above inequality. This gives a lower bound for the 
product of the variances of X and P: 


j,2 
(X — (v, Xv) 1) v|/? ||\(P — @, Pv) D vi? = a 


This is Heisenberg’s uncertainty principle. It tells you that you can’t enjoy both 
worlds. If you measured the precise position, then the variance of X becomes zero. 
Unfortunately, there is a price to pay: the variance of P becomes huge. As a result, 
there is no hope to measure the original momentum any more. 

This also works the other way around: if you measured the precise momentum, 
then the variance of P would vanish. Unfortunately, in this case, there is yet another 
price to pay: the variance of X would become infinite. As a result, there is no hope 
to measure the original position any more! 

This is why we better wait with the actual observation until later. In the meantime, 
we can still “play” with our original observables algebraically, to design all sorts of 
new interesting observables. 


7.6 Eigenvalues 


7.6.1 Shifting an Eigenvalue 


Let’s use the power of linear algebra to study the commutator. Let C and T ben x n 
matrices, not necessarily Hermitian. Assume also that they don’t commute with each 
other. On the contrary: they have a nonzero commutator, proportional to T: 


[C, T] =CT —TC =af, 


for some complex number a 4 0. 
Let u be an eigenvector of C, with the eigenvalue 2: 


Cu =u. 


How to find more eigenvectors? Just apply T to u. Indeed, if Tu is a nonzero vector, 
then it is an eigenvector of C as well: 


CTu=(TC+I[C, T) u=(TC+aT)u= (At+a)Tu. 


In summary, we’ve managed to “shift” \ by a, obtaining a new eigenvalue of C: 
A + a. Moreover, so long as we don’t hit the zero vector, we can now repeat this 
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procedure time and again, obtaining more and more eigenvectors of C, with new 
eigenvalues, shifted more and more. 


A, Ata, A+2a, A+3a,.... 


Of course, because n is finite, this process must stop somewhere. Still, if n were 
infinite, then it could proceed indefinitely. 


7.6.2 Shifting an Eigenvalue of a Product 


Let’s use the above in a special case. For this purpose, let A and B ben x n matrices 
(not necessarily Hermitian). Assume that they don’t commute with each other. On 
the contrary: they have a nonzero commutator, proportional to the identity matrix: 


[A,B] =AB- BA=AI, 
for some complex number 3 0. 
Now, let’s look at the product BA. What is its commutator with B? We already 
know what it is: 


[BA, B] = BAB — BBA = B(AB — BA) = B[A, B] = BB. 


We can now use the result in Sect.7.6.1, to shift an eigenvalue of BA. For this 
purpose, let u be an eigenvector of BA with the eigenvalue A: 


BAu = Yu. 
If Bu is a nonzero vector, then it is an eigenvector of BA as well: 
BA(Bu) = (A+ B)Bu. 


Likewise, one could also shift in the opposite direction. For this purpose, look again 
at the product BA. What is its commutator with A? It is just 


[BA, A]= BAA — ABA = (BA — AB)A = —[A, BJA = —BA. 


We can now use the result in Sect. 7.6.1 once again, to design a new eigenvector of 
BA. For this purpose, take the original eigenvector u, and apply A to it: 


BA(Au) = (\— B)Au. 


This way, if Au is a nonzero vector, then it is indeed an eigenvector of BA as well, 
with the new eigenvalue \ — (3. So long as we don’t hit the zero vector, we can now 
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repeat this procedure time and again, and design more and more eigenvectors of BA, 
with new eigenvalues: 


A, A+B, AX+28, X+3G,.... 


Of course, since 7 is finite, this process must stop somewhere. Still, if 1 were infinite, 
then it might continue forever. 


7.6.3 A Number Operator 


In the above, consider a special case, in which 
B=A" and G=1. 
This way, the assumption in the beginning of Sect. 7.6.2 takes the form 
[A, A’] =. 
Furthermore, the product studied above is now 
BA=A"A, 


As a Hermitian matrix, A’ A has real eigenvalues only. Furthermore, thanks to the 
above discussion, we can now lower or raise an eigenvalue of A’ A: from ) to 


A+1, A+2, A+3, ... 
(until hitting the zero vector), and also to 

A-1, A-—2, 4-3, ... 
(until hitting the zero vector, which will be very soon). 


A" A is an important matrix: it is called a number operator. Why? Because its 
eigenvalues are 0, 1, 2,3,4,.... 


7.6.4 Eigenvalue—Expectation 


Indeed, let u be an eigenvector of A’ A, with the eigenvalue ): 


A" Au = Nu. 


7.6 Eigenvalues 245 


With some effort, both \ and u could have been calculated (Chap. 3, Sect.3.1.1). 
Better yet, there is no need to calculate them explicitly at all! 

Since A’ A is Hermitian, \ must be real (Chap. 1, Sect. 1.9.4). Could be negative? 
No! Indeed, look at the expectation of A” A at u: 


\(u, u) = (u, A” Au) = (Au, Au) > 0. 


Let’s use u to design more eigenvectors. 


7.6.5 Ladder Operator: Lowering an Eigenvalue 


Recall that we consider now a special case: in Sect.7.6.2, set B = A", and p=1. 
This way, the above eigenvector u could be used to design a new eigenvector: Au. 
Indeed, if Au is a nonzero vector, then it is an eigenvector of A” A in its own right, 
with a smaller eigenvalue: \ — 1. 

In this context, the matrix A serves as a “ladder” operator. By applying it to u, we 
go down the ladder, to a smaller eigenvalue. 

What is the norm of Au? To find out, look at the expectation of A” A at u: 


|| Au||? = (Au, Au) = (u, A" Au) = ACU, uv) = Allull’. 


In other words, 


|| Awl] = VAlull. 


Later on, we’ll make sure to normalize the eigenvectors. 


7.6.6 Null Space 


This lowering procedure can’t continue forever, or we'd hit a negative eigenvalue, 
which is impossible (Sect.7.6.4). It must stop upon reaching some eigenvector w 
whose eigenvalue is zero. At this stage, lowering is no longer possible. So, we see in 
retrospect that our \ must have been a nonnegative integer number: this is the only 
way to make sure that the lowering procedure eventually hits zero, and stops. 

In summary, we must eventually reach a new vector w, for which 


A" Aw =0. 
How does w look like? Well, at w, A” A must have zero expectation: 


(Aw, Aw) = (w, A” Aw) = (w, 0) = 0. 
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Therefore, 
Aw=0 


as well. Thus, w lies in the null spaces of both A and A’ A (Chap. 1, Sect. 1.9.2, and 
Chap. 3, Sect.3.1.1). 

Fortunately, we can now go ahead and apply the reverse procedure to w, to obtain 
bigger eigenvalues back again. 


7.6.7 Raising an Eigenvalue 


For this purpose, let’s use Sect.7.6.2 once again (again, with B = A” and 3 = 1). 
This way, from our original eigenvector u, we can now form A’u: a new eigenvector 
of A” A, with a bigger eigenvalue: \ + 1. In this context, A” serves as a new ladder 
operator, to help “climb” up the ladder. 

What is the norm of A’u? Well, since [A, A”] = I, 


|A“ul|? = (A%u, A“) 
= u, AA") 
u, (A"A +[A, A“]) u) 
Uu, (A"A + 1) u) 
= (u, (A+ 1)u) 
= (A+) (u,u). 


a a a 


In other words, 


A“ul] = VX + Iu. 


Later on, we’ll make sure to normalize these eigenvectors, as required. 


7.7 Hamiltonian and Its Eigenvalues 


7.7.1 Hamiltonian of the Harmonic Oscillator 


So far, we’ve studied the number operator A’ A from an algebraic point of view: its 
eigenvalues and eigenvectors. Still, what is its physical meaning? To see this, let’s 
model a harmonic oscillator: a spring. 

To model position and momentum, we already have our Hermitian matrices X 
and P. Let’s use them to define a new matrix—the Hamiltonian: 
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where the mass m and the frequency w are given parameters. (Don’t confuse w with 
the vector w in Sect. 7.6.6, or with the angular velocity in geometrical mechanics.) 
This is the Hamiltonian observable. It will be used to observe the total energy in the 
harmonic oscillator—kinetic and potential alike. 

What could the total energy be? Well, it must be an eigenvalue of H. At what 
probability? This already depends on the current state v: normalize it, calculate its 
inner product with a (normalized) eigenvector, take the absolute value, and square it 


up. 


7.7.2 Concrete Number Operator 


How do the eigenvalues of H look like? To see this, let’s define A in Sect. 7.6.3 more 


concretely: 
Awe | (x4). 
2h mw 


This way, its Hermitian adjoint is 


A i 1 mw 1 
[A, A*] = —|x+—P,x P| =-—- — ((X, P]-[P, Xp =1, 
mw mw 2h mw 


as in Sect. 7.6.3. For this reason, the above properties still hold, including raising and 
lowering eigenvalues. To construct normalized eigenvectors of A” A, just start from 
some w in the null space of A, normalize it to have norm 1, apply A” time and again, 
and normalize: 


Since A’ A is Hermitian, these are orthonormal eigenvectors of A” A, which can help 
decompose just any vector (Chap. 1, Sects. 1.9.5 and 1.10.5). 


248 7 Quantum Mechanics: Algebraic Point of View 


7.7.3 Energy Levels 


So, our concrete number operator still has the same algebraic properties as before. 
Still, what is its physical meaning? To see this, let’s calculate it explicitly: 


i mw i i 
A’A = —(|X—-——P X +—P 
2h mw mw 


mw > 1 > i 
= — [| xX*+ P* + —[X, P] 
8) 


2h m?w2 m 
mw > 1 4 
= — [| X°+ P*)+ —ihI 
2h m2w2 
1 1 
hw 2 


Thus, the Hamiltonian matrix is strongly related to the concrete number operator: 
= h 1 
H=hw{|{A‘A+ ra |. 


Thus, the total energy in the harmonic oscillator has a very simple form. After all, it 
is just an eigenvalue of H: 
hw 3hw Shw Thw 
2 2" 2 


In units as small as /, this is just the frequency w, times a (nonnegative) integer number 
plus one half. This is indeed quantum mechanics: energy is no longer continuous, 
but comes in discrete levels. 

Furthermore, the eigenvectors of H are the same as those of A’ A, designed 
above. Thanks to conservation of energy, each of them makes a constant state, with 
no dynamics at all: if your initial physical state is an eigenvector, then it must remain 
so forever. After all, it must preserve the same energy level—its eigenvalue. For this 
reason, the wave function must be a standing wave that never moves. Let’s see how 
this looks like. 


7.7.4 Ground State 


First, let’s look at w—the eigenvector that lies in the null space of A (Fig. 7.2). What 
is its physical meaning? Well, it represents a very strange case: no momentum, and 
no motion at all! 

This is the minimal energy level. Indeed, w is in the null space of A’ A as well: 
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component 


Xp 0 x 


Fig. 7.2 Gaussian distribution: w lies in the null spaces of both A and A’ A, with zero expectation: 
(Aw, Aw) = (w, A" Aw) = 0. To be atx = Xx,k, the particle has probability |wx|2. Thus, it is 
highly likely to be at x = 0—the expectation 


A’ Aw =0. 
For this reason, w is an eigenvector of H as well: 
= h 1 hw 
Hw=hw{|A gt nae 


Thus, w is indeed the ground state: even with no particle at all, there is still some 
minimal energy. This energy is very small: in units as small as h, it has just w/2 
units, where w is the frequency. 


7.7.5 Gaussian Distribution 


How does w look like? Well, it makes a Gaussian distribution, with zero expectation. 
In Fig. 7.2, we illustrate the components wz, as a function of x. To lie atx = Xxx, 
the particle has probability |w,|? (provided that || w|| = 1). 
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7.8 Coherent States 


7.8.1 Dynamic State 


In Schrodinger’s picture, the matrices X, P, and H never change (Sect. 7.2.3). The 
dynamics is in the state, which may change in time, along with the physical infor- 
mation it carries: the probabilities encapsulated in it. 

How can such a moving state look like? Well, let’s try a vector we already know: 
a (normalized) eigenvector of H: 


But this is no good: there is no dynamics here. After all, can this state ever change? 
No, it can’t! Indeed, the total energy must remain the same eigenvalue of H. So, 
to introduce a time dependence, the best you can do is to multiply this state by the 
(complex) number exp(iwt). But this introduces no dynamics. After all, this has no 
effect on the probabilities. Besides, the state is defined up to a scalar multiple only. 
Thus, this is still a standing wave function that travels nowhere. 

A standing wave is rather rare: most waves travel in time. For this purpose, let’s 
turn to amore general state v. Let’s expand it in terms of the orthonormal eigenvectors 


of H: 
v= O(a wv) & (ayn, 


k>0 


In this expansion, look at the kth coefficient: take its absolute value, and square it up: 


(peer) = glows) 


Assuming that ||v|| = 1, this is the probability to have energy level hw(k + 1/2). 
This probability is not necessarily constant: it may change in time, along with the 
entire state v. This is how the wave function can indeed travel. 


7.8.2 Coherent State 


For example, assume that v is a coherent state: an eigenvector of A, not of A’ A: 


Av = Xv, 
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amplitude 


joel 


RA a 


Fig. 7.3 A coherent state: a Gaussian distribution, shifted by a complex number A. For each k, 
|vg|* is the probability to be at x = Xxx 


where J is now not necessarily real: it may have a nonzero imaginary part. After all, 
A is not Hermitian. 

How does v look like? Well, it makes a Gaussian distribution, shifted by the 
complex number X. This is illustrated in Fig.7.3: to lie at position x = X,,,, the 
particle has probability |v,|*. 


7.8.3 Coherent State—Nondeterministic Energy 


In a coherent state, we’ve just seen the probability to be at position x = X;,%. This 
is illustrated in Fig.7.3. Still, there is yet another interesting question: what is the 
probability to have a certain amount of energy? Well, not every amount is allowed, 
but only our discrete energy levels. Fortunately, the probability to be at the kth energy 
level is already available in Sect. 7.8.1. In a coherent state, it is even simpler: 
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i l((a'y'w.0)) = gle. 44)F 


k! k! 
1 
=5 (w, Mv) |’ 
1 
= ial lew, v)|? 
5 Al 
I(w, v)| oo 


Here, we have the factor |(w, v)[?. How to estimate it? For this purpose, note that 
the probabilities must sum to 1: 


|A 2k 
1 = |v, vy? DS — = Mw, v)P exp (1A). 


k>0 


As a result, 
|(w, v)? = exp (—|Al’). 


7.8.4 Poisson Distribution 


This is the Poisson distribution (Fig. 7.4). Unlike the Gaussian distribution, it contains 
no geometrical information: it doesn’t tell us where the particle might lie on the 
x-axis, but only what energy it may have. In fact, the probability to have energy 
hw(k + 1/2) is 


where \ is the eigenvalue of the coherent state with respect to A. 

Thus, our coherent state is essentially different from the ground state, or from any 
other eigenvector of A” A. Indeed, in a coherent state, energy is no longer known or 
conserved. On the contrary: it is still nondeterministic: it is not yet known for sure, 
and its probabilities can even change in time. 


7.9 Particle in Three Dimensions 


7.9.1 Tensor Product 


So far , we’ve seen three important observables: position, momentum, and energy. 
Let’s design a new observable: angular momentum. For this purpose, we need to 
introduce a new (discrete) spatial dimension. 
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© « » » » »« »« »« « « 
k= A? k 


Fig. 7.4 The Poisson distribution. To have energy hw(k + 1/2), the probability is 
exp(—|A|?)|A|7* / k!, where A is the eigenvalue of the coherent state with respect to A. The maximal 
probability is at k = |A? 


Fig. 7.5 The discrete y 
two-dimensional grid: m 
horizontal rows, of m points 
each. Since n = m?, our 
general state v makes a 
(complex) grid function, 
defined at each grid point. 
Furthermore, in each 
individual row, X acts in the 
same way: it couples grid 
points in the same row 


Assume now that n = m?, for some integer m. This way, our general state v can 
be viewed not only as a vector but also as a grid function: 


v € Cc = Cum 


(Fig. 7.5). Let’s redefine our observables in the new m x m grid. To represent 
x-position, for example, our new position matrix should act in each horizontal row 
individually: it should couple grid points in the same row, not mix different rows. 
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Let x and p denote the x-position and x-momentum in one horizontal row in the 
grid. Let X and P be the corresponding m x m matrices, acting in just one horizontal 
row in the grid. How to extend them to the entire grid? 

For this purpose, let J be the m x m identity matrix. Let’s use X, P, and J to define 
extended n x n block diagonal matrices that act in the x-spatial direction, x-row by 


X-TOW: 
x P 


XOl= : and P@/l= 
xX P 
Here, the new symbol ‘®’ produces a bigger matrix: the tensor product of two smaller 
matrices. 


Likewise, to act in the y-spatial direction in the grid (y-column by y-column), 
define new n x n (Hermitian) matrices: 


Xial Pil tees av iateattes Pint 
Xool ot : 
T@XxX= ; and 1@P= 
Xml Pig 43 28: Pnml 


These new definitions are quite useful. For example, how likely is the particle to be 
at position (X;,;, X ;,;)? Well, the probability for this is just |v,, ; 7. After all, the state 
v is now interpreted as a grid function, with two indices. 


7.9.2 Commutativity 


Thus, X @ J is completely different from J ® X: the former acts on the individual 

x-rows, whereas the latter acts on the individual y-columns in the grid. Because they 

act in different spatial directions, these matrices commute with each other. 
Furthermore, although X and P don’t commute, X @ J and J ® P do: 


(X@NU@P)==] © 9 * | =U@P)(X@n. 
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Likewise, P ® J and J ® X do commute with each other: 
(P@eINUSX)=TeOX)(PRSlI). 


Let’s extend this to a yet higher dimension. 


7.9.3. Three-Dimensional Grid 


So far, we’ve only considered scalar position and momentum: x and p. Let’s move 
on to a three-dimensional position: 


lil 


(Chap. 2, Sect. 2.4.1). This way, r encapsulates three degrees of freedom: the positions 
in the x-, y-, and z-coordinates. Likewise, p is now a three-dimensional vector: 
the linear momentum, containing the x-, y-, and z-scalar momenta. In quantum 
mechanics, each component should be mirrored by a matrix. 

For this purpose, assume now that n = m?, for some integer m. This way, our 
general state v can be interpreted not only as a vector but also as a grid function: 


vec’secnxmxm 


Let’s take our m x m matrices X and P and place them in suitable tensor products. 
This way, we obtain extended n x n matrices, which may help observe position and 
momentum in the x-, y-, and z-coordinates: 


R, =X@I@I 
Ry, =1@X@l 
R,=1@1@X 
Pe =P@I@I 
Py =I1@P@l 
P,=1Q1@P. 


These new definitions are quite useful. For example, how likely is the particle to be 
at position (X;,;, Xj,;, Xx,4)? Well, the probability for this is just |v;, ;,x 7. After all, 
the state v is now interpreted as a grid function, with three indices. 

Do these new matrices commute with each other? Well, it depends: if they act in 
different spatial directions, then they do (Sect. 7.9.2). For example, 
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[Rest] =O). 


If, on the other hand, they act in the same spatial direction, then they don’t. For 
example, 7 
[Ry, Px] =ihI OT Ql. 


This is used next. 


7.10 Angular Momentum 


7.10.1 Angular Momentum Component 


Thanks to these new matrices, we can now define a new kind of observable: angular 
momentum component. This mirrors the original (deterministic) angular momentum: 


Ly = RP, RP, 
Ly = R,P, — RyP; 
figs RP RP 


Are these indeed legitimate observables? Well, thanks to Sects. 7.9.2—7.9.3, they are 
indeed Hermitian. For instance, 


L* = (RyP, — RP)" = P*R* — P*R' = P.Ry — PyR, = RyP, — R,Py = Ly. 


This matrix mirrors the x-component of the vector product r x p (Chap. 2, Sect. 2.4.2). 
Here, however, we have an algebraic advantage: two matrices can combine to produce 
the third one. This shows once again how clever it is to use matrices to model physical 
observables. 


7.10.2 Using the Commutator 


The above matrices don’t commute with each other. On the contrary: thanks to 
Sects. 7.3.4 and 7.9.2—7.9.3, they have a nonzero commutator: 


[Les Ly] = [RyPp RP R.Py = RP, | 
= [Ry P., R:Px] — [Ry Pr, Rx Pz] — [Re Py, RePx] +[R- Py, Rx Pe] 
= RyP,[P., R.| — RyRy [P-, P.] — Py Px [R:, Rz] + PyRx [Rz, Pe] 
= —ihRyP, — (0) — (0) +ihPyRy 
= ihL,. 
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The same works in the other components as well: 


es eae 
(Ljeteal Henly 
[Egb,| SthL, 


Insummary, L,, L,and L, don’t commute. On the contrary: each two have a nonzero 
commutator—the third one (times ih). 


7.10.3 Ladder Operator: Raising an Eigenvalue 


Let’s use the above to raise an eigenvalue. For instance, let u be an eigenvector of 
L, with the eigenvalue 2: 
Liu = Au. 


Since L, is Hermitian, \ must be real. How to raise it? For this purpose, define a new 
matrix: 
T=L,+iLy. 


It doesn’t commute with L,. On the contrary: they have a nonzero commutator: 


[Las T= [bate iy] 
= [Le L,| +i[ Le, Ly| 
= ihLy +hL, 


Thanks to Sect.7.6.1, we can now raise A: if Tu is a nonzero vector, then it is an 
eigenvector of L, as well, with the bigger eigenvalue \ + h. This way, T serves as a 
ladder operator, to help “climb” up the ladder. Furthermore, this procedure can now 
repeat time and again, producing bigger and bigger eigenvalues, until hitting the zero 
vector, and the maximal eigenvalue of L,. 

These eigenvalues are the only values that the z-angular momentum might take. 
This is indeed quantum mechanics: angular momentum is no longer continuous. On 
the contrary: it may take certain discrete values only. 


7.10.4 Lowering an Eigenvalue 


How to lower A? For this purpose, redefine T as 


C2, =i, 
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It doesn’t commute with L,. On the contrary: they have a nonzero commutator: 


(feet |= (Late iy] 
=| 22, =7 | bats| 
= ihLy —hL, 
= —AT. 


Thanks to Sect. 7.6.1, we can now lower A: if Tu is a nonzero vector, then it is an 
eigenvector of L, as well, with a new eigenvalue: \ — h. This way, our new T serves 
as a new ladder operator, to help go down the ladder. Furthermore, this procedure can 
now repeat time and again, until hitting the zero vector, and the minimal eigenvalue 
of L,. 

So far, we’ve studied the eigenvalues and eigenvectors of L,. The same can now be 
done for L, and L, as well. Let’s combine these matrices to form the entire angular 


momentum. 


7.10.5 Angular Momentum 


Let’s place the above matrices as blocks in a new rectangular 3n x n matrix: 
Ly 
L={| Ly 
L; 


This is the nondeterministic angular momentum. It mirrors the deterministic angular 
momentum r x p (Chap. 2, Sect.2.4.2). In the following exercises, we'll see a few 
interesting applications. 


7.11 Exercises 


7.11.1 Eigenvalues and Energy Levels 


1. What is an observable? Hint: a Hermitian n x n matrix. 

2. May an observable have an eigenvalue with a nonzero imaginary part? Hint: a 
Hermitian matrix may have real eigenvalues only (Chap. 1, Sect. 1.9.4). 

3. How many (linearly independent) eigenvectors does an observable have? Hint: 
n. 

4. How many (distinct) eigenvalues may an observable have? Hint: at most n. 
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3. 


A degenerate observable has at least two (linearly independent) eigenvectors 
that share the same eigenvalue. How many distinct eigenvalues may such an 
observable have? Hint: at most n — 1. 

Can the position matrix X be degenerate? Hint: X must have distinct elements 
on its main diagonal, to stand for distinct positions that the particle may take. 
Consider the Hamiltonian of the harmonic oscillator (Sect. 7.7.1). Is it Hermitian? 
Is it a legitimate observable? 

May it have an eigenvalue with a nonzero imaginary part? 

May it have a negative eigenvalue? Hint: the number operator must have a non- 
negative expectation (Sects. 7.6.3—7.6.4). 


. What is the minimal eigenvalue of the Hamiltonian? 


May it be zero? Hint: it must be bigger than the minimal eigenvalue of the number 
operator, which is zero. 


. May the harmonic oscillator have no energy at all? 
. What is the minimal energy of the harmonic oscillator? Hint: this is the minimal 


eigenvalue of the Hamiltonian. 


. What is the eigenvector corresponding to this eigenvalue? Hint: this is the ground 


state—the state of minimal energy. 


. How does the ground state look like geometrically? Hint: a Gaussian (Fig. 7.2). 
. What is the probability to have a certain amount of energy? Hint: at probabil- 


ity 1, it has the minimal energy. It can’t have any other energy level. This is 
deterministic. 


. Can the ground state change dynamically in time? Hint: no! Energy must remain 


at its minimum. 


. Consider some other eigenvector of the Hamiltonian. Can it change dynamically 


in time? Hint: no! Energy must remain the same eigenvalue of the Hamiltonian. 


. How much energy may the harmonic oscillator have? Hint: the allowed energy 


levels are the eigenvalues of the Hamiltonian. 


. To model the Hamiltonian well, what must the dimension n be? Hint: 1 should 


better be infinite. Indeed, one can start from the ground state, and raise eigenval- 
ues time and again, designing infinitely many new states, with more and more 
energy. 


. In the harmonic oscillator, is there a maximal energy? Hint: The above process 


is unstoppable. Indeed, the zero vector is never hit (Sect. 7.6.7). 


. Consider an angular momentum component like L, (Sect. 7.10.1). Is it Hermi- 


tian? 


. Is it a legitimate observable? 

. Are its eigenvalues real? Hint: see Chap. 1, Sect. 1.9.4. 

. Are its eigenvectors orthogonal to each other? Hint: see Chap. 1, Sect. 1.9.5. 

. Normalize them to have norm 1, and be not only orthogonal but also orthonormal. 
. Consider some eigenvalue of L,. Consider an n-dimensional state v € C”. 


How likely is the z-angular momentum to be the same as the above eigenvalue? 
Hint: normalize v to have norm 1. Then, take its inner product with the relevant 
(orthonormal) eigenvector. Finally, take the absolute value of this inner product, 
and square it up. 
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29. Must zero be an eigenvalue of L,? Hint: yes—look at the constant eigenvec- 
tor, or any other grid function that is invariant under interchanging the x- and 
y-coordinates: x <> y (Sect.7.9.3). 

30. Conclude that L, must have a nontrivial null space. 

31. Given a positive eigenvalue of L, show that its negative counterpart must be an 
eigenvalue as well. Hint: interpret the eigenvector as a grid function. Interchange 
the x- and y-spatial coordinates: x < y. This makes a new eigenvector, with 
the negative eigenvalue. 

32. Show that L, has a few eigenvalues of the form 


0, th, 42h, 43h, .... 


Hint: see Sects. 7.10.3-7.10.4. 
33. Show that L, may in theory have a few more eigenvalues of the form 


Hint: use symmetry considerations to make sure that, in this (finite) list, the 
minimal and maximal eigenvalues have the same absolute value. 


7.11.2 Spin 


1. Define new 3 x 3 matrices: 


_ (90 0 
S, =h|00-i 
Oi 0 
(900i 
S,=h]} 000 
—i00 
_f9-i0 
S,=h|i00}], 
00 0 


where i = /—1 is the imaginary number. 
2. Are these matrices Hermitian? 
. Are they legitimate observables? 
4. Show that 


Ow 


7.11 Exercises 261 


— 


18. 


19. 


20. 


21. 


22. 
23. 


24. 


[Ser 5y | =FR8, 
[5,8] =7h8, 
(SeaiS, | Sains, 


. Conclude that these matrices mirror the angular momentum components in 


Sect. 7.10.2. 


. Focus, for instance, on S,. What are its eigenvectors and eigenvalues? 
. Show that (1, i, 0)! is an eigenvector of S,, with the eigenvalue h. 
. Interpret this eigenvector to point in the positive z-direction. Hint: its positive 


y-direction has a larger phase—it is at angle 7/2 ahead of the positive x-direction 
(Fig. 7.6 and Sects. 7.2.4—7.2.5). Follow the right-hand rule: place your right 
hand with your thumb pointing in the positive x-direction, and your index finger 
pointing in the positive y-direction. This way, your middle finger will point in 
the positive z-direction (Chap. 2, Sect.2.2.4). 


. Show that (1, —i, 0)’ is an eigenvector as well, with the eigenvalue —h. 

. Interpret this eigenvector to point in the negative z-direction (Fig. 7.7). 

. Normalize these eigenvectors to have norm 1. Hint: divide by V2. 

. Show that (0, 0, 1)‘ is an eigenvector as well, with the eigenvalue 0. 

. Conclude that this eigenvector is in the null space of S,. 

. Show that these eigenvectors are orthogonal to each other. Hint: calculate their 


inner product, and don’t forget the complex conjugate. 


. Conclude that they are not only orthogonal but also orthonormal. 
. Is this as expected from a Hermitian matrix? Hint: yes—a Hermitian matrix must 


have real eigenvalues and orthonormal eigenvectors. 


. Is this also as expected from Sects. 7.10.3—7.10.4? Hint: yes—an eigenvalue may 


be raised or lowered by h. 

Consider a new physical system: a boson. This is an elementary particle with a 
new physical property: spin. This is a degenerate kind of angular momentum: it 
has no value, but just direction. As such, it is more mathematical than physical. 
Interpret S, as a new observable, telling us the spin around the z-axis. This is 
a degenerate kind of angular momentum: the boson “spins” around the z-axis 
with no specific rate, but just with a specific direction: either counterclockwise 
(spin-up), or clockwise (spin-down). 

How likely is the boson to have spin-up? Hint: take the current state v € C?, 
normalize it to have norm 1, calculate its inner product with (1, i, 0)'/ fd take 
the absolute value of this inner product, and square it up. 

How likely is the boson to have spin-down? Hint: do the same with (1, —i, 0)'/ 
V2. 

How likely is the boson to have spin-zero? Hint: |v3|?. 

Repeat the above exercises for S, as well. This makes a new observable: left- or 
right-spin, around the x-axis. 

Repeat the above exercises for S\, as well. This makes a new observable: in- or 
out-spin, around the y-axis, pointing deep into the page (denoted by ‘@’). 
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imaginary axis: 7 


y is at phase 1/2 
ahead of x 


spin-up © 
real axis zt 


Fig.7.6 How likely is the particle to have spin-up? Take the eigenvector (1, i)‘ //2 or (1, i, 0) //2. 
Thanks to the right-hand rule, it points from the page towards your eye, as indicated by the ‘©’ at 


the origin. Then, calculate its inner product with the (normalized) state v. Finally, take the absolute 
value of this inner product, and square it up 


negative imaginary axis: —7 


Ko is at phase 7/2 behind x 


real axis zr 


spin-down ® 


Fig. 7.7 How likely is the particle to have spin-down? Take the eigenvector (1, —i)'//2 or 
(1, —i, 0)'/./2. Thanks to the right-hand rule, it points deep into the page, as indicated by the 
‘®’ at the origin. Then, calculate its inner product with the (normalized) state v. Finally, take the 
absolute value of this inner product, and square it up 
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7.11.3 Pauli Matrices 


Nn 


. The above spin is also called spin-one, because the maximal eigenvalue is | 


(times h). Consider now a new physical system: a fermeon. (For example, an 
electron or a proton or a neutron.) 


. This is a yet simpler kind of particle. It has a yet simpler kind of spin, called 


spin-one-half, because its maximal observation is 1/2 (times h). 


. For this purpose, consider a state of a yet smaller dimension: v € C?. Define the 


2 x 2 Pauli matrices: 


_h(10 
G2 = 5 \ ied 
h (01 
ay 2\10 
i (0-i 
Oz are 


. Are these matrices Hermitian? 
. Are they legitimate observables? 
. Show that these matrices satisfy 


. Conclude that these matrices mirror the angular momentum components in 


Sect. 7.10.2. 


. What are the eigenvectors of o,.? Hint: the standard unit vectors (1,0)! and 


(0, 1)’. 


. Are they orthonormal? 

. What are their eigenvalues? 

. What are the eigenvectors of 0? Hint: (1, 1)’ and (1, —1)’. 

. Are they orthogonal to each other? 

. Normalize them to have norm 1. 

. What are their eigenvalues? Hint: +//2. 

. Is this as expected from a Hermitian matrix? Hint: yes—a Hermitian matrix must 


have real eigenvalues and orthonormal eigenvectors. 


. Is this also as expected from Sects. 7.10.3—7.10.4? Hint: yes—an eigenvalue may 


be raised or lowered by h. 


. What are the eigenvectors of o,? Hint: (1, i)’ and (1, —i)’. 
. Are they orthogonal to each other? Hint: calculate their inner product, and don’t 


forget the complex conjugate. 


. Normalize them to have norm 1. Hint: divide by /2. 
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20. 
21. 


22. 


23. 


24. 


25; 
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What are their eigenvalues? 
As in spin-one above, interpret these eigenvectors to indicate spin-up or spin- 
down. Hint: see Figs. 7.6-7.7. 
Show that the Pauli matrices have the same determinant: 

#2 
det (o,) = det (cy) = det (o,) = re 


Show that the Pauli matrices have the same square: 


where J is the 2 x 2 identity matrix. 

Does this agree with the eigenvalues calculated above? Hint: take an eigenvector, 
and apply the Pauli matrix twice. 

Consider now a new physical system: two fermeons (say, two electrons). Could 
their states be exactly the same? Hint: no! This is Pauli’s exclusion principle. 


7.11.4 Polarization 


. The Pauli matrices could be used to observe not only spin-one-half but also a 


completely different physical property, ina completely different physical system. 


. For this purpose, consider now a new physical system: a photon. 
. The photon is a boson. As such, it has a state in C3, to help specify its spin-one. 


Still, it also has yet another state in C*, to help specify yet another physical 
property: its polarization. 


. Indeed, the photon is not only a particle but also a light ray, or an electromag- 


netic wave. As such, it travels in some direction: say upwards, in the positive 
z-direction. At the same time, it also oscillates in the x-y plane. 


. To help observe this, the Pauli matrices could also serve as new observables. In 


particular, o,, tells us how likely the photon is to oscillate in the x- or y-direction: 
in a (normalized) state v = (v;, v2)‘, the probability to oscillate in the x-direction 
is |v; |, whereas the probability to oscillate in the y-direction is |v2|?. 


. At the same time, oy tells us how likely the photon is to oscillate obliquely in the 


x-y plane, at an angle of 45° from the positive part of the x-axis. To calculate 
the probability to oscillate at angle +45°, just take the eigenvector (1, +1)'//2 
calculate its inner product with v, take the absolute value, and square it up. What 
do you get in terms of v; and v2? 


. Finally, oz tells us how likely the photon is to make circles in the x-y plane 


(Figs. 7.6-7.7). To calculate the probability to make circles (counter)clockwise, 
take the eigenvector (1, +i)'/./2, calculate its inner product with v, take the 
absolute value, and square it up. What do you get in terms of v; and v2? 


Part III 
Polynomials and Basis Functions 


The polynomial is an algebraic object as well. Indeed, two polynomials can be 
added to or multiplied by each other. Still, the polynomial is also an analytic object: 
it can be differentiated and integrated, as in calculus. These two aspects can now 
combine to design a special kind of function: basis function, useful in many practical 
applications. 

To study the polynomial, we can use tools from linear algebra. Indeed, a vector 
could be used to model a polynomial and store its coefficients. Still, the polyno- 
mial has more algebraic operations: multiplication and composition. This way, the 
polynomials make a new mathematical structure: a ring. 

In turn, polynomials of a certain degree make a new vector space. Basis functions 
can then be designed carefully to extend this space further. This way, they can help 
design a smooth spline in three spatial dimensions. This is the key to the finite-element 
method. 

Basis functions could be viewed as a special kind of vectors. Indeed, they span a 
new linear space, with all sorts of interesting properties. Later on, we’ll use them ina 
geometrical application: designing a new spline to approximate a given function on 
a mesh of tetrahedra. Furthermore, we’ ll also use them in the finite-element method, 
to help solve complex models in quantum mechanics and general relativity. 


Chapter 8 M®) 
Polynomials and Their Gradient ae 


The polynomial is a special kind of function, easy to deal with. We start with a 
polynomial of just one independent variable: x. We discuss a few algebraic operations 
that can be applied to it. In particular, we introduce a few algorithms to calculate the 
value of the polynomial at a given argument. 

Then, we move on to a more complicated case: polynomials of two (or even three) 
independent variables: x and y (and even z). Geometrically, we are now in a higher 
dimension: not only the one-dimensional axis R, but also the two-dimensional plane 
IR*, and even the three-dimensional space R?. 

Our polynomials are then differentiated in these new domains. This way, we 
obtain more advanced analytical objects: partial and directional derivatives. Although 
we focus on polynomials, the discussion could be easily extended to more general 
functions as well. This will be used later in advanced applications in physics and 
chemistry. 


8.1 Polynomial of One Variable 


8.1.1 Polynomial of One Variable 


What is a polynomial? Well, a real polynomial is a real function p : R — R, defined 
by 


n 
P(X) = ay + ax + agx? +--+ ax” = ax, 
i=0 


where n is a nonnegative integer number (the degree), and do, a), dz, ..., An are real 
numbers (the coefficients). Usually, it is assumed that a, 4 0. Otherwise, it could 
drop. 
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Thus, to define a concrete polynomial of degree n, it is sufficient to specify its 
coefficients: do, a1, d2,...,@,. Thus, the polynomial is mirrored by the (n + 1)- 
dimensional vector 

(ao, 41, 42, .-.,@n) € R". 


What is a complex polynomial? Well, it is different from the above real polynomial 
in one aspect only: the coefficients ao, a1, d2,...,@, (as well as the independent 
variable x) can now be not only real but also complex numbers. This makes the 
polynomial a complex function p : C — C, rather than a mere real function p : 
R-R. 

In this chapter, polynomials are studied as algebraic objects, with the arithmetic 
operations that can be applied to them: addition, multiplication, composition, etc. 
Furthermore, polynomials are also viewed as analytic objects that can be differenti- 
ated and integrated. This will be quite useful in many applications later on. 

Later on, the original polynomial of one independent variable will also be extended 
to more complicated polynomials of two or even three independent variables, along 
with the relevant algebraic and analytic operations: partial, normal, and tangential 
differentiation. 


8.1.2 Adding Polynomials 


The above interpretation of the polynomial p(x) as an (n + 1)-dimensional vector 
is particularly useful in arithmetic operations. To see this, let g(x) be yet another 


polynomial of degree m: 
q(x) = ye b;x!. 
i=0 


Without loss of generality, assume that m < n. (Otherwise, just interchange the roles 
of p and q.) This leaves two possibilities: if m = n, then we can go ahead and add 
p and q. If, on the other hand, m < n, then we must first define n — m fictitious zero 
coefficients 

bn = bm42 = ais = by = 0, 


to let g have n + 1 coefficients as well. 
We are now ready to define the new polynomial p + q: 


(p+ Q(x) = pe) +g) = DOG + bx". 
i=0 


Thus, the original vectors have been added to each other coefficient by coeffi- 
cient. This means that vectors indeed mirror polynomials in terms of addition and 
subtraction: 
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(p — Q(x) = p@®) — gx) = DOG — bis". 
i=0 


8.1.3 Multiplying a Polynomial by a Scalar 


How to multiply the original polynomial by a given scalar c? Again, this is done 
coefficient by coefficient: 


n 


(cp)(x) =c- p(x) = ©) ajx' = lead. 


i=0 i=0 


This way, the original vector of coefficients has been multiplied by cc component 
by component. Again, vector mirror polynomials: not only in terms of addition but 
also in terms of scalar multiplication. 


8.1.4 Multiplying Polynomials 


In the above, we’ve seen that a polynomial of degree n is mirrored by the (n + 1)- 
dimensional vector of its coefficients in terms of addition, subtraction, and multipli- 
cation by a scalar. Still, the polynomial is more than that: it has yet another algebraic 
operation—multiplication by another polynomial. 

In Chap. 1, Sect. 1.7.1, and Chap.2, Sects. 2.2.2, 2.2.3, we have seen that two 
vectors can be multiplied by each other in two different ways: inner product, which 
produces a mere scalar, or vector product, in which two three-dimensional vectors 
produce a new three-dimensional vector. Still, general vectors of higher dimension 
cannot be multiplied by each other to produce a new vector. In the context of poly- 
nomials, on the other hand, this is possible: 


(pq)(x) = p(x)q(x). 


After all, polynomials are just functions, which can be multiplied by each other. 
Still, we’re not done yet. Indeed, we are not only interested in the value of pq 

for a given argument x. After all, this is easily available: just calculate p(x) and 

q(x), and multiply. Here, however, we want much more than that: the entire vector 

of coefficients of the new polynomial pq. This vector is useful in many applications. 
The product of the two polynomials 


n 


P(x)= ax and g(x) = pa 
j=0 


i=0 
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Fig. 8.1 How to multiply e, e , e- 5 
p(x) = ap +.a1x +. ax" 4 asbos abi x a 
a3x3 by q(x) = bo + bx + 
byx?? Sum the terms * ‘ i 
diagonal by diagonal: in the azbox? aby x3 agbox* 
kth diagonal, sum those 
terms with the power x* * 
(0<k<5) ® e @ : 

aibox a,b, 2x aybox? 

ate Pee Paste 
is defined by 
n m n m 
(pqy(x) = p(x)q(x) = So ajx' S0bjx! = OY ajbjx!*. 
i=0 j=0 i=0 j=0 


Note that this double sum scans the (n + 1) x (m+ 1) grid 
iG. Jf) | VSts ale fem). 


In this rectangular grid, each point of the form (i, j) contributes a term of the form 
a;b;x'*/, Still, it makes sense to sum the contributions diagonal by diagonal, as in 
Fig. 8.1. After all, on the kth diagonal, each point of the form (i, j) satisfies i+ j = k, 
and contributes a;b;x't/ = a;by_ix*. 

This way, the above grid is scanned diagonal by diagonal, rather than row by row. 
The diagonals are indexed by the new index k = i+ j = 0,1,2,...,n +m. The 


inner index, the row-index i, must never exceed the original grid: 


nom n+m min(k,n) 
i+j k 
(pq)(x) = ) ) Gb! = : y Aj Duix". 
i=0 j=0 k=0 i=max(0,k—m) 


Thus, the new polynomial pq is associated with the vector 


(co, Cy, C2, C3, +65 Cn+m) = (QF: 


where the individual coefficient c;, is defined by 
min(k,n) 


C= ) aj by_j. 


i=max(0,k—m) 
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Let’s look at a simple example, in which g is a trivial polynomial of degree m = 0: 
q(x) = Do. 


In this case, the coefficients cz are simply 
k 
Ce = D> ajby_i = abo 
i=k 
(0 < k <n). Thus, in this case, 
(pax) = Yo cex* = Yo boaex* = bo Y) ajx' = bo: p(x). 
k=0 k=0 i=0 


This agrees with the original definition of scalar times polynomial: the scalar bo times 
the original polynomial p(x) (Sect. 8.1.3). 


8.2. Horner’s Algorithm 


8.2.1 Computing the Value of a Polynomial 


Consider now a new task: for a given argument x, how to compute the value of p(x)? 
The naive algorithm contains three stages: 


e First, calculate the individual monomials (powers of x): 


Fortunately, this can be done recursively: 
xiax-x'! G =2,3,4,...,n). 


This costs n — 1 scalar multiplications. 
e Then, multiply each monomial by the corresponding coefficient, to obtain 


QyX, A2x7, a3x°,..., nx”. 
This costs n more multiplications. 
e Finally, sum up: 


Ay + a,x + anx* + +++ + a_x" = p(x). 


This costs n scalar additions. 
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Thus, the total cost is 2n — 1 multiplications and n additions. Could this cost be 


reduced? Fortunately, it could. For this purpose, one must introduce parentheses, 
and take a common factor out of them. 


8.2.2 Introducing Parentheses 


Consider the problem of computing 

ab + ac, 
where a, b, and c are some given numbers. The direct calculation requires two 
multiplications to calculate the individual products ab and ac, and one addition to 
sum up. Could this be done more efficiently? Yes—just use the distributive law: 
introduce parentheses, and take the common factor a out of them: 


ab+ac=a(b+c). 


To calculate the right-hand side, the cost is smaller: one addition to calculate b + c, 
and one multiplication to calculate a(b + c). 


8.2.3 Horner’s Algorithm 


The same idea can also help calculate 


P(x) = ye ajx! 
i=0 


at a given argument x. Here, however, the task is a little more complicated: sum not 
only two but also n + | terms. Although these terms share no common factor, the n 
latter terms 
De 3 n 
AX, AQX*~, A3X", +, AnX 
do: the common factor x, which can be taken out of parentheses: 


P(x) = a9 + xpi(x), 


where the new polynomial pj; (x) is defined by 


n 

2 -1 i-1 

Pi(x) = ay t+ anx +a3x° ++) ayx" = ) ajx'. 
i=l 
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Better yet, if a, = 0, then we could take not only x but also x? out of parentheses. In 
general, we could take factor x* out of parentheses, for some (fixed maximal) k > 1: 


P(x) = ao + x* py (x), 


where the new polynomial p;(x) is of degree n — k: 
n 
Pe(X) = ag + apg x + ag p2x? ++ + gx” = ae 
=k 


This way, the value of p,(x) can be calculated recursively by the same algorithm 
itself. This is Horner’s algorithm. 


8.2.4 Efficiency of Horner’s Algorithm 


Fortunately, Horner’s algorithm costs less: at most n multiplications and n additions. 
To see this, use mathematical induction on the degree n. Indeed, for n = 0, p(x) is 
just the constant function: p(x) = do, so there are n = O multiplications and n = 0 
additions. Now, forn = 1, 2,3,..., assume that the induction hypothesis holds, so 
the calculation of p;(x) costs as little as nm — 1 multiplications and n — 1 additions. 
Thus, calculating 

D(x) = ay + xpi(x) 


requires just one more multiplication to calculate xp;(x), and one more addition to 
calculate ay + xpi(x), which totals n multiplications and n additions at most, as 
asserted. This completes the induction step, and indeed the entire proof. 


8.2.5 Composition of Polynomials 


To compose two given polynomials, just mirror Horner’s algorithm. The composition 
of the polynomials p and q is defined by 


(po q)(x) = p(q(x)). 


This formula is good enough to compute the value of p oq at any given argument x: 
first, compute g(x), then use it as an argument in p to compute p(qg(x)). Still, here 
we want more than that: we want to have the entire vector of coefficients of the new 
polynomial p o q. After all, this vector is most useful in many applications. 

To define the required vector of coefficients, use mathematical induction on the 
degree of p: n. Indeed, for n = 0, p is just the constant function p(x) = ao, so 
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(p 0q)(x) = p(q(x)) = ao 


as well. Now, for n = 1,2,3,..., assume that we already know how to obtain 
the entire vector of coefficients of py og, where px, is defined in Sect. 8.2.3. (This 
is the induction hypothesis.) From Sect.8.1.4, we already know how to multiply 
polynomials. This helps compute g* (if k > 1). (Later on, we’ll see a yet better way 
to compute g*.) Furthermore, this also helps multiply g* times p; o q. Finally, just 
add ag to the first coefficient. This completes the induction step. This completes the 
inductive (or recursive) Horner algorithm for composing two polynomials. 


8.3. Decimal and Binary Numbers 
8.3.1 Natural Number as a Polynomial 


Every natural number k is actually a polynomial. After all, it must satisfy 
i” <k< 10" 


for some nonnegative integer number n. This leads to the decimal representation of 
k as a finite list of digits: 
GnGn—14n—2°** 4140, 


which actually stands for the polynomial 


k =a +a, -10+a-10° +--+ +a,- 10" = ) a; - 10! = p(10). 
i=0 


In other words, the original natural number k is nothing but a polynomial, evaluated 
at the argument x = 10 (the decimal base). 


8.3.2 Binary Polynomial 


In the above, we’ve used base 10. Still, there is nothing special about this base: we 
could use base 2 as well. For this purpose, note that our natural number k must also 
satisfy 

Qm < k < gmt 


for some nonnegative integer number m. This leads to the binary representation of 
k, which can actually be viewed as a binary polynomial q, evaluated at the argument 
x = 2 (the binary base). The coefficients of g are the binary digits 
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bo, bi, bo, tees Din, 


which are either 0 or 1. In its binary form, k is often written as a finite list of binary 
digits (or bits): 
DinBm—1Bm—2 mae bi bo. 


This means that k is just the value of the polynomial g at 2: 


m 


k= bot by -2+b,- 2 +++) +B 2" = -bj2/ = GQ). 
j=0 


Why is this useful? Well, in this form, even a very large k could be stored most 
efficiently on the computer. This is particularly important in coding—decoding algo- 
rithms in cryptography (Chap. 5 in [61]). 


8.4 Implicit Horner Algorithm 
8.4.1 Monomial: Individual Power 


Consider now a new task: how to calculate the value of a monomial? More specifi- 
cally, for a given x, how to calculate x* efficiently? For this purpose, the binary form 
k = q(2) comes handy, although implicitly. 
The naive algorithm to compute x* is rather expensive: it requires k — 1 multipli- 
cations to compute 
eS Ke, FSH2, 8 Awe 


Still, this algorithm has an advantage: as a by-product, we have not only x* but also 
x7, x3, x4,..., x41. This might be worthwhile if we need all these new monomials. 
But what if we don’t? Is there a more efficient algorithm? 

Fortunately, there is: Horner’s algorithm, in its implicit form. This way, x* can be 
calculated in at most 2m multiplications, where 2” < k < 2”*!, Fortunately, 2m is 
as small as 2 log, k, which is usually far smaller than k — 1. 


Indeed, from Horner’s algorithm, we have 
k = q(2) = bo + 2q1(2), 


where the degree of g; is smaller than the degree of g. This way, 


2\2(2)_ = 
xh = x92) = phot 2012) — bn x22) — bo (42) _ YX (x’) oe 
(x2)"° if by =0. 
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Why is this efficient? Because to calculate the right-hand side, one needs just two 
more multiplications: 


e one to calculate x? to help calculate (x7) recursively, 
e and another one to multiply (x?) by x (if bo = 1). 


It is easy to prove by mathematical induction that the total cost is as small as 2m 
multiplications, where m is the degree of q. 

To simplify the above algorithm, let’s eliminate g, and leave it implicit. After all, 
bo is just the unit binary digit in 


k = q(2) = bo + 2912). 

Thus, if k is even, then 
k 
bop =0 and q,(2) = rt 


If, on the other hand, k is odd, then 


4 
bo =1 and qi(2) = ——. 


Thus, the algorithm takes the more explicit form 


cf x- G2)" if k is odd 
=~ (x2)k/2 if k is even. 


In this form, the algorithm uses k only, as required. g, on the other hand, is never 
mentioned any more. 

For every polynomial p, we can now use the same idea to compute p* efficiently: 
just mirror the above algorithm. 


8.5 Differentiation and Integration 
8.5.1 Derivative of a Polynomial 


So far, we have treated the polynomial as an algebraic object, with arithmetic opera- 
tions like addition and multiplication. Furthermore, we also looked at it as a function, 
and calculated its value at a given argument x. Next, let’s treat it as an analytic func- 
tion, which can be differentiated and integrated. 

Consider again a polynomial of degree n: 


P(x) = ajx'. 
i=0 
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Its derivative is a new polynomial of degree max(0, n — 1): 


age) ifn =0 
BE Ne aS ee ea! its 0. 


This is a new polynomial in x. As such, it can be differentiated as well to produce 
the second derivative of p: 


” a d’p a ’ ’ 
PHO) 5) PED): 


This is a new polynomial of degree max(0, n — 2). As such, it can be differentiated 


as well to produce the third derivative of p, and so on. In general, fori = 0,1, 2,..., 
the ith derivative of p is defined recursively by 


i PP if i=0 
P= | (p@Y it i>. 


In this notation, the zeroth derivative is just the function itself: 
0) — 
p' Ys DP, 


and its derivative is 
To > 3h 
p' — p'. 


Furthermore, the nth derivative of p is just a constant, or a polynomial of degree 0: 
p™ =agn!. 
For this reason, every higher derivative must vanish: 


pY=0, i>n. 


8.5.2 Indefinite Integral 


Could you guess a new function, whose derivative is p? This is called the primitive 
function of p. It is also called indefinite integral (not to be confused with definite 
integral, defined later). For our polynomial 


n 
P(x) = do ax’, 
i=0 
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the primitive function is a new polynomial of degree n + 1: 


n 4 n+l a 
ij i-l j 
P(x)= ) - xitla ) ==", 
M~j+1 : i 
i=0 i=l 


Indeed, the derivative of P is just the original polynomial p: 
P'(x) = p(x), 


as required. This is why P is also called the antiderivative of p. 


8.5.3 Definite Integral over an Interval 


The indefinite integral can now be used for a geometrical purpose: calculating an area 
in the Cartesian plane. For this purpose, the original polynomial p is also interpreted 
as a geometrical object. After all, it makes a graph in the x-y Cartesian plane. What 
is the area underneath this graph? 

More precisely, let’s bound the area from all four sides. For this purpose, assume 
that the graph of p is above the x-axis. This way, our area is already bounded from 
two sides: from below by the x-axis, and from above by the graph itself. To bound it 
from the left and the right as well, just issue two verticals from the x-axis upwards: 
at x = a on the left, and at x = Db on the right (where a < b are some fixed real 
numbers). 

What is the area of this region? It is just 


b 
/ P(x)dx = P(b) — P(a). 


a 


This is the fundamental theorem of calculus. Note that this area may well be negative, 
if p is mostly negative, and its graph is mostly underneath the x-axis. 
Here are some elementary examples. If p(x) = x (a linear polynomial), then 


P(x)= Mey so 
b b2 — 2 b 
[ a= a eer eR 
a 2 2 


This is just the length of the interval, times the value of p at its midpoint. This is 
called the trapezoidal (or the trapezoid, or the trapezium) rule. 
If, on the other hand, p(x) = x? (a quadratic polynomial), then P(x) = x3 /3, S80 


b 3 3 
[ear 
7 3 


8.5 Differentiation and Integration 279 


Finally, if p(x) = | is just the constant function, then P(x) = x, so 


b 
/ dx =b—-a, 


which is just the length of the original interval [a, b]. 
When a = 0 and b = 1, we have the unit interval [0, 1]. In this case, the above 
formula simplifies to read 


n+l 


1 3 
i p(x)dx = P(1) — PO) = Pw) = > a 


i=1 


8.6 Polynomial of Two Variables 


8.6.1 Polynomial of Two Independent Variables 


So far, we’ve considered a polynomial of one independent variable: x. Next, let’s 
consider a polynomial of two independent variables: x and y. 
What is this? Well, a real polynomial of two variables is a function of the form 


p:R>R 


that can be written as 
n 


P(x, y) = Doai(x)y’, 


i=0 


where x and y are real arguments, and a;(x) (0 <i < n) is areal polynomial in one 
independent variable: x only. 
Likewise, a complex polynomial in two independent variables is a function 


p:C>C 
with the same structure as above, except that x and y can now be not only real but 


also complex numbers, and the polynomials a; (x) can now be complex polynomials 
of one variable. 


8.6.2 Arithmetic Operations 


How to carry out arithmetic operations between polynomials of two variables? The 
same as before (Sects. 8.1.2—8.1.4). The only difference is that the a;’s are no longer 
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scalars but polynomials (in x) in their own right. Fortunately, we already know how 
to add or multiply them by each other. 
In summary, consider two polynomials of two variables: 


pQx,y) = )oa(x)y! and g(x,y) =) bj()y/ 
i=0 j=0 


(for some natural numbers m < n). How to add them to each other? Well, if m <n, 
then define a few dummy zero polynomials: 


bn4t = bm42 = +++ = b, = 0. 


We are now ready to add: 


(p+). y) = py) +40. y¥) = VG + b)@y', 
i=0 


where 
(aj + bj)(x) = aj(x) + Bix) 


is just the sum of polynomials of one variable, 
Furthermore, how to multiply p times g? Like this: 


(pqy(x, y) = p(x, yg, y) 


= (Saco! > bi @)y! 
i=0 j=0 
=> | @bp@y¥ 


i=0 \ j=0 


This is just the sum of n polynomials of two variables, which we already know how 
to do. 


8.7 Differentiation and Integration 


8.7.1 Partial Derivatives 


So far, we’ve looked at the polynomial 


8.7 Differentiation and Integration 281 


n 


P(x, y) = > aj(x)y’ 


i=0 


as an algebraic object, with two arithmetic operations: addition and multiplication. 
Fortunately, it can also be viewed as an analytic object, with a new analytic operation: 
partial differentiation. 

For this purpose, let’s view y as a fixed parameter, and differentiate p(x, y) as a 
function of x only. The result is called the partial derivative of p with respect to x: 


Dx(x, y) = Yoaix)y', 
i=0 


where a; (x) is the derivative of a; (x). 
Now, let’s work the other way around: view x as a fixed parameter, and differentiate 
p as a function of y only. This is the partial derivative of p with respect to y: 


ele ifn =0 
Py P= 1 ai (xdiy') = WG + Daii(e)y! ifn > 0. 


Note that both partial derivatives are polynomials of two variables in their own right. 
Together, they make a pair, or a two-dimensional vector: the gradient. 


8.7.2. The Gradient 


Let p, serve as the first component, and py, as the second component in a new 
two-dimensional vector. This makes the gradient of p at the point (x, y): 


cree (20%) 


Thus, the gradient of p is actually a vector function that not only takes but also returns 
a two-dimensional vector: 
Vp:R > R’. 


8.7.3 Integral over the Unit Triangle 


As an analytic object, p(x, y) can be not only differentiated but also integrated. This 
could be viewed as an extension of the fundamental theorem of calculus (Sect. 8.5.3). 

Where is the integration carried out? For this purpose, consider the so-called unit 
triangle 
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Fig. 8.2. The unit triangle ¢ 1 


o 
a 


Fig. 8.3 Integration over the 
unit triangle: for each fixed 
x, integrate over the vertical 
O<y<I1-x 


t={@%,y) | O0<x,yx+y<IJ 


(Fig. 8.2). This way, the unit triangle sits on its base: the unit interval [0, 1]. From 
this interval, issue many verticals upwards, in the y-direction. To integrate on f, just 
integrate on each and every individual vertical. 

Our aim is to calculate the volume under the surface that p makes in the Cartesian 
space: the two-dimensional surface (or manifold) z = p(x, y) in the x-y-z space. 
For simplicity, assume that p is positive, so this surface lies above the x-y plane. 
What is the volume underneath it? More precisely, what is the volume between the 
surface and the horizontal x-y plane below it? 

Still, to calculate a volume, we must be yet more precise. Our three-dimensional 
region must be bounded not only from above and below but also from all other 
sides. For this purpose, from the unit triangle f, issue three “walls” upwards, in the 
z-direction. This way, we got what we wanted: a closed three-dimensional region. 
What is its volume? It is just the integral of p over t: 


[ [ve y)dxdy. 


Note that this volume may well be negative, if p is mostly negative. For simplicity, 
however, we assume that p is positive. 

To calculate this integral, let 0 < x < 1 bea fixed parameter, as in Fig. 8.3. 
Furthermore, let P(x, y) be the indefinite integral of p(x, y) with respect to y: 


n a (x) n+l a (x) 
i i i-1 i 
PG N= Dy = 
i=0 i=1 


This way, P(x, y) is characterized by the property that its partial derivative with 
respect to y is the original polynomial p(x, y): 
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Py(x, y) = p(x, y). 


Fortunately, we’ve already split ¢ into many verticals, issuing from its base upwards, 
in the y-direction. To integrate over f, just integrate on each and every vertical: 


1 lx 1 
[ [rc y)dxdy = i (/ p(x, ody) dx = / P(x,1—x)dx. 
t 0 0 0 


Thanks to the fundamental theorem of calculus, we already know how to calculate 
this. This is indeed the volume of our three-dimensional region, as required. 


8.7.4 Second Partial Derivatives 


Let us now return to differentiation. We’ve already differentiated p with respect to x 
and y to produce the partial derivatives p, and p,. Fortunately, these are polynomials 
of two variables in their own right. As such, they can be differentiated as well, to 
produce the second partial derivatives of p. For example, the mixed partial derivative 
of p is 

Pry, y) = (px(x, y)), - 


From Sect. 8.7.1, partial differentiation (or derivation) is insensitive to the order in 
which it is carried out: 


n—-1 


Pry, y) = DoE + Daly s(x)y! = pyr (x,y). 
i=0 


This is also called the (1, 1)st partial derivative of p. After all, x! = x and y! = y, 
so 


Pxy(*, y) = Pxtyt, y). 
With this notation, the (0, 0)th partial derivative of p is nothing but p itself: 
Pxyo(X, Y) = p, y). 
The process may now continue yet more: differentiate a second partial derivative, 


and obtain a new partial derivative of order three. For example, the (2, 1)st partial 
derivative of p is 


Px2y! (x,y) = Duxy(X, Y) = Dax, y))y . 


In general, the (i, j)th partial derivative of p is defined diagonal by diagonal, using 
mathematical induction on its order: i + j = 0,1,2,3,... (Fig. 8.4): 
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j 
(i,j) = (0,3)e 
Me SG, 7S 
jt+j= 0 1 2 3 


Fig. 8.4 To define the (i, j)th partial derivative, March diagonal by diagonal: use mathematical 
induction oni + j = 0,1, 2,3,... 


p ifi=j=0 
Pxiyi = (pxi-tys) if i>0 
(pry), if 7 >0. 


Fortunately, if both i > 0 and j > O, then these formulas agree with each other. 
Indeed, in the same mathematical induction, we could also prove that reordering 
doesn’t matter: 


(Pxiyt), = (Pxi-tyi1) = (Pxi-tyi-1) yx (Psi), ; 
) 7 y 


This completes the induction step, and indeed the entire definition, as required. 

To count the partial derivatives, let’s use some results from discrete math. How 
many distinct partial derivatives of order up to (and including) m are there? Well, 
from Sect. 10.15 in [60, 63], the answer is 


m+2\_ (m+2)!__ (m+1(m +2) 
2 ~ Oem! 2 


How many distinct partial derivatives of order m exactly are there? The answer to 


this is 
m+2—-1\ _ (m+l1)\ _ 
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Indeed, here they are: 


Px%ym, Pxlym-1, Px2ym-2, 2.4, Pymy0. 


8.7.5 Degree 


The original form 


p(x, y) =) aj(x)y’ 


i=0 


is somewhat incomplete: the degree is not necessarily n. To uncover the degree, a 
yet more explicit form is needed. 
For this purpose, one must write each polynomial a; (x) more explicitly: 


aj(x) = Yo ai jx4, 


jz0 


where the qj, ;’s are some scalars. This way, p can now be written as 


paws) aay =>) > ax: 


i>0 i>0 j>0 


The degree of p is the maximal sum i + j for which there is here a nontrivial 
monomial of the form a;,;x/y! (with a;,; 4 0). Note that, unlike in a polynomial of 
one variable, here the degree may be greater than n. 

In a polynomial p(x, y) of degree m, what is the maximal number of distinct 
monomials? Well, this is the same as the total number of distinct pairs of the form 
(i, j), with i + j <m. From Sect. 10.15 in [60, 63], this number is just 


5 = 


ee (m+ 2)! — (m+ 1)(m + 2) 
~ mt. 2! 2 


8.8 Polynomial of Three Variables 


8.8.1 Polynomial of Three Independent Variables 


A polynomial of three independent variables is defined by 
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n 


D(x, ys z) = So ai(x, y)z', 


i=0 
where the coefficients a; are now polynomials of two independent variables: x and 
y. 
How to add, subtract, or multiply polynomials of three variables? Fortunately, 
this is done in the same way as in Sects. 8.1.2—8.1.4. There is just one change: the 


a;’s are now polynomials of two variables in their own right. Fortunately, we already 
know how to “play” with them algebraically. 


8.9 Differentiation and Integration 


8.9.1 Partial Derivatives 


Let us view both y and z as fixed parameters, and differentiate p(x, y, z) as a function 
of x only. This produces the partial derivative with respect to x: 


n 
Px(x,¥,2) = ) Gide, yz. 
i=0 
In this sum, the coefficients a; are differentiated with respect to x as well. 


Similarly, let us view both x and z as fixed parameters, and differentiate p(x, y, z) 
as a function of y only. This produces the partial derivative with respect to y: 


Py(*, ¥.2) = Do ai)y (x, y)z. 
i=0 


Finally, let us view both x and y as fixed parameters, and differentiate p(x, y, z) as 
a function of z only. This produces the partial derivative with respect to z: 


te = 0 ifn =0 
PAO = Lye a(x, yz! = YE + Dail, yz! ifn > 0. 


Together, these partial derivatives make a new three-dimensional vector: the gradient. 


8.9.2 The Gradient 


Once placed in a three-dimensional vector, these partial derivatives form the gradient 
of p: 
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Dx (x, y, Zz) 
V p(x, y,zZ) = | py, y, Zz) 
Dz(X, Y, 2) 


Often, the gradient is nonconstant: it may change from point to point. Only when p 
is linear is its gradient constant. 
Thus, the gradient of p is actually a vector function (or field): 


Vp: ROR. 


8.9.3 Vector Field (or Function) 


A vector field could actually be even more general than that. For this purpose, consider 
three real functions (not necessarily polynomials): 


f =f, y, 2) 
& = a(x, y, Z) 
h=h(x, y,Z). 


Let’s place them in a new three-dimensional vector. This makes a new vector function: 


x a OSI) 
yJ>{e]=|e@y.2 
Zz h h(x, y, Z) 


This is also called a vector field. In what follows, we consider a differentiable vector 
field, in which f, g, and h are differentiable functions. 


8.9.4 The Jacobian 


So far, we’ve defined the gradient of the polynomial p. This is a column vector. The 
transpose gradient, on the other hand, is a row vector: 


V' p(x, y, 2) = (px(x, y, 2), Py(*,¥, 2), Pr(X, y,2))- 


Let us now apply the transpose gradient to a vector field, row by row: 


f Vif kk fy hi 
Vil g MS eS | Becky Be 
h Vth hy hy h: 
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This 3 x 3 matrix is the Jacobian of the original vector field. 
In summary, the original mapping 


x f 
yiI7]1 8 
Zz h 


has the Jacobian matrix 


WF 84) _ or f te fy fi 


Fv ee § | =] 8x By & 
O(x, y, Z) h hy h, h, 


(The dependence on the spatial variables x, y, and z is often omitted for short.) The 
Jacobian matrix will be particularly useful in integration. 


8.9.5 Integral over the Unit Tetrahedron 


The unit tetrahedron 7 is a three-dimensional region, with four corners (or vertices): 
(0,0,0), (1,0, 0), (0,1,0), and (0,0, 1) 


(Fig. 8.5). Furthermore, T is bounded by four triangles (faces or sides). In particular, 
T sits on its base: the unit triangle ¢ in Fig. 8.2. In terms of analytic geometry, T is 
defined by 

T ={(@, y,z) | O<x,y,z,x+y+z<]}. 


The original polynomial 


n 


P(x, y,2) =D ai(x, yz! 


i=0 


Fig. 8.5 The unit 
tetrahedron T 
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can now be integrated in 7. For this purpose, from each point on the base of 7, 
just issue a vertical upwards, in the z-direction, until it hits the upper face of T. To 
integrate in 7, just integrate on each and every individual vertical. 

To carry out this plan, let P(x, y, z) be the indefinite integral of p(x, y, z) with 
respect to z: 


ai(x, y) 3 Get) 

i i i-1-, i 

Pande Sepa = ye 
i=0 i=1 


With this new definition, we are now ready to integrate: 


1—x-—y 
iff pos. yadrdydc = f [(f p0s. yd) dxdy 
T 


=| [er pt os= PE. Oaady 


=| [Pe y,l—x—y)dxdy. 


Fortunately, this is just a two-dimensional integration in t, which we already know 
how to do (Sect. 8.7.3). 


8.10 Normal and Tangential Derivatives 


8.10.1 Directional Derivative 


Let us now return to the subject of differentiation. So far, we’ve differentiated 
p(x, y, Z) ina Cartesian direction: x, y, or z. This produces the three partial deriva- 
tives. Let us now generalize this, and differentiate p in just any spatial direction as 
well. 

For this purpose, let n be a fixed three-dimensional vector in R?: 


ny) 


n= ng — (ny, n2, n3)' € R?. 


Assume also that n is a unit vector: 


|n||2 = Vnitns+n3 = 1, 


Define p,(x, y, z) as the directional derivative of p(x, y, z) in the direction pointed 
at by n. This is the inner product of n with the gradient of p at (x, y, z): 
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Pn(x, ¥.z) = (a, Vp(x, y,z)) = WV p(x, yz) = m1 px (x, y, 2) + napy(X, y, Z) +73 z(x,y, z). 
For short, we often omit the dependence on the specific point (x, y, z): 
Pn = (n, Vp) =n Vp =p, + Nz Py +13 Pz. 
This still depends on (x, y, z) implicitly, and may still change from point to point. 


Only when p is linear is the directional derivative constant. 
Next, let’s look at an interesting special case. 


8.10.2. Normal Derivative 


Assume now that 
= £ 
n= (ny 32, n3) 


is normal (or orthogonal, or perpendicular) to a particular plane in R°. In other words, 
n makes a zero inner product with the difference between any two points that lie on 
the plane. In the case, the above directional derivative is also called normal derivative. 
As a matter of fact, n could be normal to a mere line in R?. For example, consider 
the line 
{(x,y,0) | xt y=1} CR’. 


(This line contains one of the edges in the unit tetrahedron in Fig. 8.5.) Consider two 
distinct points on this line: 


(x, 1—.x,0) and (x, 1-—<x,0), 


where 0 < x,X < 1, and x # x. The difference between these two points is just 


(x, l—x,0)-— (4,1—%,0) =@—-%,1-x-(1-—%),0) = (—%,% —-x,0). 
This difference is orthogonal to two constant vectors: 
(1, 1,0)’ and (0,0, 1)’. 


Thus, to be normal to the above line, n could be either 


0 if 1 
ol 0 0a) arene ee 
1 nf, 0 J2 


or just any (normalized) linear combination of these two vectors. 
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8.10.3 Differential Operator 


It is sometimes convenient to use differential operators: 0/Ox means partial differ- 
entiation with respect to x, 0/Oy means partial differentiation with respect to y, and 
O/Oz means partial differentiation with respect to z. With these new notations, the 
operator of normal differentiation can be written as 


ce d = 
=n n n : 
On "Ox “ay Az 
For example, if 
1 
n= —(, 1, 0), 


J2 


then the operator of normal differentiation takes the form 


Jee 
On J2\0x dy)’ 


For yet another example, consider the plane 


{, yz) | xtytz=HCR. 


(This plane contains the upper face of the unit tetrahedron in Fig. 8.5.) The normal 
vector to this plane is 


Thus, in this case, the operator of normal differentiation is just 
ae. 3) " ) ” fo) 
dn J3\Ox Oy dz)" 


8.10.4 High-Order Normal Derivatives 


Because the normal derivative of a polynomial is a polynomial in its own right, it has 
a normal derivative as well. The normal derivative of the normal derivative is called 
the second normal derivative. 

This can be extended to a yet higher order. Indeed, by mathematical induction on 
i = 1,2,3,..., the (¢ + 1)st normal derivative is just the normal derivative of the ith 
normal derivative. 
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8.10.5 Tangential Derivative 


So far, we have assumed that n was normal to a given line or plane in the three- 
dimensional Cartesian space. Assume now that n is no longer normal but rather 
parallel to the line or the plane. This way, n is orthogonal to every vector that is 
normal to the original line or plane. In fact, if n was shifted to issue from the original 
line or plane rather than from the origin, then it would be contained in that line or 
plane, and would indeed be tangent to it as well. 

The directional derivative in the direction pointed at by n is then called the tangen- 
tial derivative. Furthermore, we also have a yet higher order: the tangential derivative 
of the tangential derivative is called the second tangential derivative, or the tangential 
derivative of order 2. 

Again, this is just mathematical induction: the zeroth tangential derivative is the 
original function itself. Now, fori = 0,1, 2,..., the (i + 1)st tangential derivative 
(or the tangential derivative of order i + 1) is defined as the tangential derivative of 
the ith tangential derivative. This will be useful later in the book. 


8.11 High-Order Partial Derivatives 


8.11.1 High-Order Partial Derivatives 


For polynomials of two variables, high-order partial derivatives have already been 
defined in Sect. 8.7.4. For polynomials of three variables, on the other hand, high- 
order partial derivatives have also been used implicitly in Sect. 8.10.4. Here, however, 
we define them more explicitly, including mixed partial derivatives. 

For this purpose, recall that the partial derivative of a polynomial of three variables 
is a polynomial of three variables in its own right. As such, it can be differentiated 
once again. For example, p, can be differentiated with respect to z to produce 


Pxz (X,Y, Z) = (Px (X,Y, Z))z- 


From Sect. 8.9.1, the order in which the partial differentiation takes place is imma- 


terial: 
n—-1 


Pxz(X, ys Z) = ya < 1) (Gi+1)x (x, y)zi = Dex (X, y; Z). 
i=0 


Furthermore, let’s use differential operators (Sect.8.10.3) to define the (i, j, k)th 
partial derivative: 


a\'(/a\i fart 
povatoxor=((j) (3) (Be) 
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For example, the (2, 1, 0)th partial derivative of p is just 


Px2yio(X, Y= Prxy(X, y, Z). 


In particular, the (0, 0, 0)th partial derivative of p is just p itself: 


Pxoyoz0(X, Y, Z) = p(x, y, Z). 


The order of the (i, 7, k)th partial derivative is the sum i + j + k. With this termi- 
nology, the (i, j, k)th partial derivative could have been defined more explicitly by 
mathematical induction on the orderi + j +k =0,1,2,3,...: 


p ifi=j=k=0 
(Pxi-tyizt), if i>0 
(Pxiyi-tek) , if J > 0 


(Pxiyickt), if k>0. 


Pxiyik = 


As discussed in Sect. 8.7.4, the same mathematical induction could also have been 
used to prove that these definitions always agree with each other. 

To count the partial derivatives, let’s use some results from discrete math. Thanks 
to Sect. 10.15 in [60, 63], the total number of partial derivatives of order up to (and 
including) m is 


m+3\_ (m+3)!__ (m+1)(n+2)(m $3) 
3 J 3t-m) 6 
Furthermore, the total number of partial derivatives of order m exactly is 


m+3—1\)_ (m+2\_ (m+2)! _ (m+ I)(m +2) 
3-1 J \ 2 J emt ~ z 


8.11.2 The Hessian 


The second partial derivatives defined above can now be placed in a new 3 x 3 matrix. 
This is the Hessian. 

For this purpose, recall that the transpose gradient of p is the row vector containing 
the partial derivatives of p: 


V' p = (Vp)! = (Px, Py, Pz) 


(Sect. 8.9.4). The dependence on the spatial variables x, y, and z is omitted here for 
short. 
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Now, to each individual component in this row vector, apply the gradient operator 


V: 
Pxx Pyx Pzx 
VV'p = (Vpx | VPy | Vp.) = Pxy Pyy Pzy 
Pxz Pyz Pzz 


This is the Hessian of p: the 3 x 3 matrix that contains the second partial derivatives 
of p. 

Fortunately, a mixed partial derivative is insensitive to the order in which the 
differentiation is carried out: 


Pxy = Pyx 
Pxz = Pz 
Pyz = Pzry- 


Thus, the Hessian is a symmetric matrix, equal to its transpose: 


Pxx Pyx Pz Pxx Pxy Pxz 
VV'p = | Pxy Pyy Pzy | = | Pyx Pyy Py: | = V'VD.- 
Pxz Pyz Pzz zx Pzy Pzz 


In other words, the Hessian is the Jacobian of the gradient. 


8.11.3 Degree 


As discussed in Sect. 8.7.5, the polynomials of two variables a; (x, y) could be written 


more explicitly: 
ae y= >" > angex*y’, 
j20 k>0 


where the a;,;,,’s are some scalars. Using this formulation, the original polynomial 
of three variables could be written as 


n 
P(X, y,z) = > ye > Gi, jax yiz'. 


i=0 j>0 k>0 


Thus, the degree of p could be much greater than n: it is the maximal sum i + j +k 
for which there is a nontrivial monomial of the form aj, ; .x* y/z' (with a;,;,, 4 0). 

How many monomials are there? Well, thanks to Sect. 10.15 in [60, 63], a poly- 
nomial of degree m may contain at most 
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m+3\_ (m+3)! (m+ 1m + 2)(m + 3) 
3 ~  m!.3) 6 


distinct monomials. 


8.12 Exercises 


8.12.1 Taylor Series of Sine and Cosine 


1. Let u = (uj)o<i<n be an (n + 1)-dimensional vector, and v = (v;)o<j<m be an 
(m+ 1)-dimensional vector. Complete both u and v into (n++-m-+ 1)-dimensional 
vectors by adding zero dummy components: 


Un+t1 = Un 2= ++ =Unim =0 


and 


Um+1 = Um 2= ++ = VUntm = 0. 


2. Define the convolution of u and v, denoted by u « v. This is anew (n +m + 1)- 
dimensional vector, whose components are 


k 
(ux*v)_ = Suite, O0<k<n-+m. 
i=0 


3. Show symmetry: 
UxV=VEU. 


4. Use u to define the new polynomial 
n n+m 
D(x) = So ujx' = oD ujx'. 
i=0 i=0 
5. Likewise, use v to define the new polynomial 
n+m 


m 
q(x) = 2 vjx! — > vyjx'. 
i=0 i=0 


6. Write the product pq in terms of the convolution vector u * v: 
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n+m 


(pqyx) = Yi (ux v)ex*. 


7. The infinite Taylor series of the exponent function exp(x) = e* around zero is 


3 4 OO. yn 


Be ee? 4 
eapa)=ltxto+atgtosha.- 


For a moderate |x|, for a sufficiently large k, this could be truncated after k + 1 
terms: 
kya 
exp(x) = ~ aC 
n=0 


Write an efficient version of Horner’s algorithm to compute this. Hint: avoid 
dividing by a factorial. 
8. Do the same for the sine function: 


9. Do the same for the cosine function: 


x4 2n 


92 
ils ala a * a 6! + “= De a (2n)! 


10. Use the above Taylor series to show that, for a given imaginary number ix, 
exp(ix) = cos(x) +i - sin(x). 


11. Use the above to write the polar decomposition of a a complex number. 
12. Use the above Taylor series to show that the derivative of exp(x) is exp(x) itself: 


exp (x) = exp(x). 
13. Use the above Taylor series to show that the derivative of sin(x) is 
sin’ (x) = cos(x). 
14. Use the above Taylor series to show that the derivative of cos(x) is 
cos’ (x) = — sin(x). 


15. Conclude that 
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cos” (x) = —cos(x). 


16. Conclude also that 
sin” (x) = — sin(x). 


17. Conclude that both sin(x) and cos(x) solve the differential equation 
u"(x) = —u(x) 


for the unknown function u(x). 

18. Conclude that every linear combination of sin(x) and cos(x) solves this differ- 
ential equation as well. 

19. Find the unique linear combination that not only solves the above differential 
equation but also satisfies the boundary conditions 


u'(0) = u(r) = O. 


Chapter 9 M®) 
Basis Functions: Barycentric cree 
Coordinates in 3-D 


Thanks to the above background, we are now ready to design a special kind of 
function: basis function (or B-spline). This will be the key to the finite-element 
method, with advanced applications in modern physics and chemistry. 

We start from a simple case: just one tetrahedron. In it, the basis function is defined 
as a polynomial. To each adjacent tetrahedron, the basis function is then extended, 
and defined as a different polynomial. This way, the basis function is piecewise 
polynomial. 

Still, at the interface between two adjacent tetrahedra, these polynomials must 
agree smoothly with each other. In other words, at a face shared by two adjacent 
tetrahedra, the basis function must be continuous. Moreover, across an edge shared 
by two adjacent tetrahedra, the basis function must be smooth: have a continuous 
gradient. These properties will be used later to design a smooth 3-D. This is the 
basis for the finite-element method, used in advanced applications in physics and 
chemistry later on. 


9.1 Tetrahedron and Its Mapping 


9.1.1 General Tetrahedron 


So far, we’ve considered a special tetrahedron: the unit tetrahedron T, vertexed at 
(0,0,0), (1,0,0), (0,1,0), and (0,0, 1) 


(Fig. 8.5). Consider now a more general tetrahedron t, with more general corners (or 
vertices): k,l, m,n € R?. Each corner is a point (or a three-dimensional column 
vector) in R*. This way, our new tetrahedron ¢ is denoted by 


© Springer Nature Switzerland AG 2019 299 
Y. Shapira, Linear Algebra and Group Theory for Physicists 
and Engineers, https://doi.org/10.1007/978-3-030- 17856-7_9 


300 9 Basis Functions: Barycentric Coordinates in 3-D 


Fig. 9.1 A general n 
tetrahedron f, vertexed at k, 
l,m, andn 


t = (k,l,m,n) CR* 


(Fig. 9.1). Here, the order of the corners is determined arbitrarily in advance. 
Let us now map T onto f. For this purpose, consider the vectors leading from k to 
the three other corners in ft. Together, these column vectors form a new 3 x 3 matrix: 


S, =(—-k | m—k | n—k). 


Later on, we’ll introduce the notion of regularity: t must be nondegenerate. In other 
words, its corners mustn’t lie on the same plane. This way, S; is nonsingular: it has 
a nonzero determinant. 

More than that: ¢ is rather thick, and not too thin. This means that S, is far from 
being singular: its determinant is far away from zero. 

To map T onto f, define the new mapping 


x x 
E; y =k+5S,] y 
< z 


This way, the corners of T map to the corresponding corners of f: 


0 
E,{|0])]=k 
0 
1 
E,{|o)]=1 
0 
0 
E; 1 =m 
0 
0 
E; 0) =n. 
1 


Furthermore, in the terminology in Chap. 8, Sect. 8.9.4, S, is the Jacobian of E;. 
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Clearly, the inverse mapping maps f¢ back onto T: 


Xx XxX 
E'|{y]]=S)'{|»]-k 
& ‘4 


For this reason, S> ! is the Jacobian of EY 7 


9.1.2 Integral over a Tetrahedron 


Let F be an integrable function in t. How to integrate in t? This could be rather 
difficult. After all, ¢ is a general tetrahedron. Better integrate in T—the unit tetrahe- 
dron. Still, F was never defined in T. What is defined in T is the composite function 
F o E,. Let’s go ahead and integrate it in T: 


Jf fre. y, z)dxdydz = jaeris.ni f ff (F o E,) (x, y, z)dxdydz. 
t T 


Let’s consider a common example that arises often in practice. Assume that F is a 
product of two integrable functions: 


F(x, y,z)= fy, 29%,y,2), Oy, EF. 
To mirror these functions, define new composite functions in 7: 
f=foEk, and G=gok;. 


In other words, 7 
f=foE;' and g=G0£E,' 


in t. In this case, the above formula takes the form 


[ff todxarac = eens ff (rong Bndxayd: 
E T 
= | det(s,)| / / i: Fadxdyde. 
T 


This will be useful below. 
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9.1.3 The Chain Rule 


The chain rule tells us how to differentiate a composite function [14]. Thanks to it, 
the gradient of f can be written in terms of the gradient of f : 

Still, f and its gradient are evaluated in f. 7 and its gradient, on the other hand, 
are evaluated in 7, not ¢. Therefore, they must be composed with E~'. This way, we 
can now take the transpose gradient V‘, and obtain a row vector: 


Vifx.y,o2=VF (a, y,2)) So). ag eet, 
The dependence on x, y, and z is often dropped, for short: 
Vif =((Wf)oz;") 5 
in t. Likewise, we can also take the gradient of g: 
Vox, y,z) = S,'Vg (EG, y, z)) , (&,y,z) €t. 


Once the dependence on x, y, and z is dropped, this takes the shorter form 


Vg =S,'(V9)0E,! 


in t. We can now take the inner product of these two gradients, and integrate in f: 


Jf fv rvsaxayac = aces ff fv Fse'spvaaxaya. 
t T 


This formula will be most useful later. We are now ready to design special polyno- 
mials in f. 


9.1.4 Degrees of Freedom 


Consider an unspecified polynomial p(x, y, z) of degree m. How to specify it? Well, 
one way is to specify its coefficients aj, ;,, (0 < i+ j+k < m).How many coefficients 
are there to specify? 

Well, thanks to discrete math, we already know the answer: 


ee. (m+ 3)! — (m+ I)(m + 2)(m 4 3) 


3 m!-3! 6 


(Chap. 8, Sect. 8.11.3). For m = 5, for example, there is a need to specify 
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7 a 
ea 8 _ 56 
3 6 


coefficients of the form a; ;,, (0 <i+ j +k <5). After all, even a zero coefficient 
must be specified as such. 

This is a rather explicit way to specify a polynomial. Is there a more implicit 
way? After all, one could also specify any 56 independent pieces of information, or 
degrees of freedom. 

For this purpose, a suitable piece of information is not necessarily a coefficient: 
it could also be the value of p (or any partial or directional derivative of p) at any 
point in the Cartesian space. 

To specify a polynomial p of degree five, for example, let’s look at the original 
tetrahedron f, and pick degrees of freedom symmetrically in it. At each corner of ft, 
let’s specify the partial derivatives of order 0, 1, and 2. 

How many such partial derivatives are there? From Chap. 8, Sect.8.11.1, the 


answer 1s me 
ea et ie 
3 2 


So far, we have already specified a total of 40 degrees of freedom: ten at each corner. 
These, however, are not enough: to characterize p uniquely, we must specify 16 
more. For this purpose, let’s look at the edges of t. In each edge, let’s look at the 
midpoint, and pick two nontangential derivatives, in a direction that is not parallel 
to the edge. 

For example, consider the edge (kK, 1), leasing from k to I. Its midpoint is (k+1) /2. 

First, let’s look at the difference: | — k. Without loss of generality, assume that 
it is “nearly” in the z-direction. This means that, its maximal coordinate is the third 
coordinate: 

| — k)3| = max (|(1— k);|, |@— k)2)). 


In this case, at the midpoint (k + 1)/2, it wouldn’t make sense to pick the z-partial 
derivative: it is nearly tangent, and has no chance to be normal. Instead, it would 
make more sense to pick the x- and y-partial derivatives, and specify 


k+1 d k+l 
~(——] an ,(——-}. 
P; 5) Py 5) 


Since there are six edges, this specifies 12 more degrees of freedom. 

So far, we’ve specified 52 degrees of freedom. Still, these are not enough: we 
need four more. What to pick? Well, at each side midpoint, let’s pick a nontangential 
derivative. 

For example, look at the face A(k, 1, m), vertexed at k, 1, and m. Consider its 
normal vector: the vector product (I—k) x (m—k). Without loss of generality, assume 
that it is “nearly” in the z-direction. In other words, its z-coordinate is maximal: 
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|(@ — k) x Qm —k))3| > max (|(— k) x (m —k)),|, |(— k) x (m—k)))]). 


In this case, at the midpoint (k + 1+ m)/3, it wouldn’t make sense to pick the 
x- or y-partial derivative: they are nearly tangent, and have no chance to be normal. 
Instead, it would make more sense to pick the z-partial derivative, and specify 


k+l+m 
a A 


Since there are four sides, this specifies four more degrees of freedom. This makes 
a total of 56 degrees of freedom, as required. 

Below, we’ll see that these degrees of freedom are indeed independent of each 
other: they specify p uniquely, as required. To do this, we need some more geometry 
and algebra. 


9.2 Barycentric Coordinates 


9.2.1 Barycentric Coordinates 


Our original tetrahedron ¢ could also be represented in a different way. For this 
purpose, let d = (x, y, z)’ be some point in ¢. Usually, d is in the interior of t. Still, 
this is not a must: d could also lie on the boundary of f: its faces, edges, or even 
corners. Anyway, d is a convex combination of the corners: 


d= Axk + A+ Amm + An, 
where Ax, Ay, Am, and Ay are nonnegative real numbers that sum to 1: 
Ak + Apt Am + An = I. 


The coefficients Ax, 1, Am, and A, are the barycentric coordinates of d 
[10, 43, 52, 57]. Together, they make a new four-dimensional vector: 


r = Ox, Mi Am; Aa)’: 


Note that all the above vectors are column vectors, although not quite of the same 
dimension: k, ], m, n, and d are three-dimensional column vectors, whereas \ is a 
four-dimensional column vector. Thus, the above convex combination can also be 
written as a four-dimensional system: 
n 
X. 
1 ) 


()=(G at 
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This way, d is written in terms of X. In fact, this is just a projective mapping: the 
“oblique” real projective space 


{A | Akt ALP+ Am + An = | 


is mapped to the “horizontal” real projective space in Chap. 6, Sects. 6.9.1 and 6.9.2. 


9.2.2 The Inverse Mapping 


Fortunately, this mapping could also be inverted, to give \ in terms of d. Indeed, the 
above 4 x 4 matrix is nonsingular: it has a nonzero determinant. To see this, let’s 
multiply it by a new matrix U—a 4 x 4 upper triangular matrix: 


Clearly, 


Therefore, 


= — det(S;) 
#0. 
9.2.3 Geometrical Interpretation 


From Cramer’s rule (Chap. 2, Sect.2.1.5), we can now have the barycentric coordi- 
nates in their explicit form. The first one, for instance, is 
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dimn 
at (47 TT) 
Ax(d) = kimn 
at (F171) 
dimn 
ae (771) 4) 
kimn 
at (T1772) 
ace dl—-d m-d n-d 
7 “lio 0 0 
ais k l1—-k m—k n-k 
“V1 0 0 0 


d S(d,1.m,n) 
aet((4, 0 )) 
k S, 
d((7 5 a) 


— det (Scam) 
— det(S;,) 


det (S(am.n)) 
det (S;) 


This gives A, an interesting geometrical meaning. To see this, just draw four new 
edges, leading from d to the corners of ft. This splits ¢ into four disjoint subtetrahedra, 
each vertexed at d and three corners of f. 

Now, look at one particular subtetrahedron, vertexed at d, 1, m, and n. Calculate 
its volume, and divide it by the volume of t. This is indeed Ax. 

In summary, Ax is the relative volume of (d, 1, m, n): the subtetrahedron that lies 
across from k inf. Similar formulas can also be written for the three other barycentric 
coordinates: 


det (Sa.d.man 

\(d@) = © ( (k,d.m, )) 
det(S,) 
det (S, i 

\m(d) = © ( (k.l.d, )) 
det(S,) 
det (Sacim 

eas See) 
det(S,) 


Why do the barycentric coordinate sum to 1? We can now interpret this not only 
algebraically but also geometrically: the four subtetrahedra sum to the original tetra- 
hedron ¢. 
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Furthermore, it is now easy to see that, at the corners, the barycentric coordinates 
are either 0 or 1: 


if i=j 


if i4j i,j € {k, 1, m, n}. 


nO =| 


After all, in the special case in which d is a corner, one subtetrahedron is t, whereas 
the three others are degenerate. This nice result will be useful later. 


9.2.4 The Chain Rule and Leibnitz Rule 


As discussed above, we now have A in terms of d: 


,_(kimn “'(d 
“A111 1)° 
Thus, A = A(d) is a function of d. More specifically, the individual barycentric 
coordinates are functions of d as well: 


Mk = Ax(d), A = (A), Am = Am(A), and An = An(A). 


We can now differentiate these functions with respect to x, y, and z. For this purpose, 
let’s look at the above inverse matrix: 


kImn)\_ 

1111 , 
Look at its first, second, and third columns. Together, they make a 4 x 3 rectangular 
matrix: the Jacobian of A with respect to d: 


Or -_ 0 (rx, mM, Am; An) 
dd Oy.) 


Fortunately, we have Cramer’s formula to help write the individual matrix elements 
explicitly (Chap. 2, Sect.2.1.4). Assume that this has already been done. This way, 
OX/Od can now be used in the chain rule. This will be quite useful: it will help 
differentiate a composite function of \ = A(d). 
To see this, let 
fA) and g(d) 


be two differentiable functions. This means that they have a gradient with respect to 


: 
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Of/O\ Bg/O% 
= Of/Or & 0g/Or 
VaF =| aF¢/ar— | Mo Y= | ag/d\n 
Of/OXn 0g/0n 


Later on, we’d like to differentiate a product like fg with respect to \. This is done 
as in Leibnitz rule: 


Vailf9 = Mgt FVag =9VaF + FVog- 
Likewise, the transpose gradient (with respect to .) is 
ViCf9) = OVS + £V39- 
To both sides of this equation, apply V) from the left: 


ViVi (f9) = Va (9ViF + £V39) 
= VgV\F t9gVVF+VSIVG+ FVAV\9- 


Now, because \ = A(d), both f and g are actually composite functions of x, y, 
and z. As such, they can be differentiated with respect to x, y, and z, to form the 
gradient. Fortunately, we have the chain rule to help do this: 


On Or\' 


Thus, the Hessian of f (with respect to x, y, and z) is 
Or O\\' Or 
VV f =VV\f (—)=(—) dV Ff ( —). 
f (Sa) (5) ve (=) 


9.2.5 Integration in Barycentric Coordinates 
Now, look at the oblique real projective space, introduced at the end of Sect.9.2.1: 
An = 1— AK — Ai— Am. 


This is easy to integrate in. Indeed, as in Sect.9.1.2, take the inner product of the 
above gradients, and integrate in f: 
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Jf [vr rvsaxayac 
t 
1 1-r, 1-AK-M Or ar t 
= | det dx mV) — —]) Vg, 
jder(ssl f ef an | dd (Fa) Ga) Y 


where the gradients on the right, V} f and V)g, are evaluated at four-dimensional 
points of the form 
Ok, At Am, 1b = Ak = Al = Am) - 


Thus, this is just an integral over the unit tetrahedron T: 


Fortunately, we already know how to do this (Chap. 8, Sect. 8.9.5). This will be useful 
later. Our functions f and g will need to be smooth not only in ¢ but also outside it. 
Let’s see how such a function should indeed be extended smoothly to a neighbor 
tetrahedron as well. 


9.3. How to Match Two Tetrahedra? 


9.3.1 Continuity Across an Edge 


In Sect. 9.1.4, we’ve already introduced 56 independent degrees of freedom. In what 
sense are they independent? In the following sense: 


A polynomial of degree five with 56 vanishing degrees of freedom must be 
identically zero. 


Let’s go ahead and prove this. 
For this purpose, consider a polynomial p(x, y, z) of degree five (or less). 
Consider the edge 
(k, D), 


leading from corner k to corner I in ¢. In this edge, Am = An = 0. Thus, unless p 
vanishes throughout the entire edge, p mustn’t contain a factor Am or Ap. 

For a start, assume that p has 20 vanishing degrees of freedom: ten vanishing 
partial derivatives (of order 0, 1, and 2) at k, and the same at I. So, how does p look 
like in the edge? Well, along the edge, look at p, and at its first and second tangential 
derivatives. Both vanish at both endpoints: k and 1. Thus, once restricted to the edge, 
p must contain cubic factors of the form \j and ;: 


P lacy= ApAR es = AR = Aq)? 
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(times an unknown factor, possibly zero). But p is of degree five at most, and mustn’t 
contain such a big factor. Therefore, p must vanish throughout the entire edge: 


P lan= 9. 


As abonus, p also has a zero tangential derivative along the entire edge. Furthermore, 
in the face A(k, 1, m), p must contain (at least a linear) factor Am: 


P |ak..m)= Am°** 


(times an unknown factor, possibly zero). Later on, we’ll use this to redefine p 
continuously outside ¢ as well. 


9.3.2 Smoothness Across an Edge 


Furthermore, under some more conditions, the above factor is not only linear but 
also quadratic: 


2 
P lakim= Ant: 


(times an unknown factor, possibly zero). To have this, let’s look at the gradient of 
p: Vp. How does it look like in the edge (k, 1)? Well, at the endpoints, we already 
know that V p vanishes. So, at both k and I, p has zero tangential derivative. Thus, 
once restricted to the edge, V p must contain quadratic factors of the form rn and nN: 


VP lan= BAN = BAL — AK)”. 
Here, since V p is a polynomial of degree four (or less), must be a constant three- 
dimensional vector. 
Assume now that p has two more vanishing degrees of freedom: at the midpoint 


(k + 1)/2, it also has two vanishing nontangential derivatives. Thus, thanks to the 
bonus at the end of Sect.9.3.1, its gradient vanishes there: 


Vp (+) = (0, 0, 0)’. 
2 
Therefore, we must have 3 = (0, 0, 0)’, so Vp vanishes throughout the entire edge: 


Vp l«,n= (0, 0, 0)’. 
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Therefore, in the face A(k, I, m), the original polynomial p must contain (not only 
linear but also) quadratic factor: 


P |Ac m= »2,--- 


(times an unknown factor, possibly zero). 


9.3.3 Continuity Across a Side 


Look now at the face A(k, I, m), and the edges in it. Assume now that p has vanishing 
degrees of freedom not only in (k, ]) but also in the two other edges: (1, m) and (m, k). 
This way, p has a total of 36 vanishing degrees of freedom: ten vanishing partial 
derivatives (of orders 0, 1, and 2) at k, 1, and m, and two vanishing nontangential 
derivatives at (k + I)/2, (1+ m)/2, and (m+ k)/2. 

Thus, the discussion in Sect.9.3.2 applies to all three edges: not only (k, I) but 
also (1, m) and (m, k). Therefore, p must contain a factor as big as 


P |adiim= AmARALT* 
But p is of degree five at most, so it must vanish throughout the entire face: 
P |ackim= 9. 
As a bonus, p also has two vanishing tangential derivative along the entire face. This 
will be useful later. 
Unfortunately, the gradient of p, although vanishes in the edges, not necessarily 


vanishes throughout the entire face. For example, V p may still take there the nonzero 
form 


(VP) |acd.m)= YAKAIAm: 


where 7¥ is a constant nonzero three-dimensional vector. 

As amatter of fact, even if V p happened to vanish at the side midpoint (k+1+m) /3 
as well, it might not vanish throughout the entire side. After all, as a polynomial of 
degree four, it might still take the nonzero form 


1 
(VP) ladim = YAcAAm (; = ) 
1 
or (Vp) |ad.tm) = YAKAAm (; = n) 


1 
or (Vp) |akim = YAKAAm (5 _ An} : 
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So, we are stuck: we can say no more about V p. Fortunately, we can still say more 
about p itself. In fact, since p vanishes throughout A(k, I, m), it must contain a linear 
factor of the form 

p= An::- 


Assume now that p has vanishing degrees of freedom in the three other faces as well. 
This way, in total, p has 52 vanishing degrees of freedom: ten at each corner, and 
two at each edge midpoint. We can then do the same in each face. As a result, p must 
contain the factor 

p= AkAtAmAn Sane 


More precisely, since p is of degree five at most, it must be of the form 
P = AkAAmAn (AKA + UAL + OmAm + OnAn) 5 


where Ox, Qj, Am, and ay are some scalars. 


9.4 Piecewise-Polynomial Function 


9.4.1 Independent Degrees of Freedom 


Assume now that p also has four more vanishing degrees of freedom: at each side mid- 
point, it has a vanishing nontangential derivative. Thanks to the bonus in Sect. 9.3.3, 
at each side midpoint, p actually has a zero gradient. So, in total, p indeed has 56 
vanishing degrees of freedom in ft. Is p identically zero? 

Well, to find out, let’s calculate its explicit gradient at the side midpoint 
(k + 1+ m)/3. Clearly, at this point, the barycentric coordinates are 


1 
= A= Am = 3 and An = 0. 


Thanks to the chain rule (Sect. 9.2.4), we can calculate the gradient indirectly: apply 
V) rather than V. This way, we differentiate with respect to Ax, 1, Am, and Ap rather 
than x, y, or z. 

What happens when such a differentiation is carried out? Well, upon evaluating 
at An = O, the term that contains ae drops. The three other terms, on the other hand, 
must be differentiated with respect to Ay, or they’d vanish as well. In summary, at 
(k +1+ m)/3, we have 
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(0, 0,0) = V'p 
= %4e(53) 
= V)P dd 
an 
= Vy AA AmAn (QkAk + OAT + OmAm + QnAn)) (53) 


Or 
= AcAAm Vj An (AA + ayy + QmAm) (5) 


Or 
= AkArAm(0, 0, 0, 1) (@kAK + OAL + OmAm) (33) 
Or 
= 3-4(0, 0,0, 1) (ax +01 + Om) (33) . 


Now, the fourth row in 0A/Od can’t be (0, 0,0), or An would be just a constant, 
which is impossible. Therefore, we must have 


Oy + ay + A = 0. 


The same could be done at the three other side midpoints as well. In summary, we 
have four linear equations: 


a+ Qn + On = 0 
Qk + Am + An = 0 
ax +ay+ ay, =0 
Ox + a+ Om = 0. 


More compactly, this could be written as a four-dimensional linear system: 


0111\ (ax 0 
1011})] a | _ fo 
1101] | am] ]o 
1110) \an 0 


Look at this 4 x 4 matrix. Is it singular? Fortunately not. Indeed, it has no zero 
eigenvalue. In fact, it has only two eigenvalues: 3 and —1. To see this, just write it as 


O111 1111 
10117 Yili _] 
1101 1111 , 
1110 1111 


where I is the 4 x 4 identity matrix. 


314 9 Basis Functions: Barycentric Coordinates in 3-D 


On the right-hand side, look at the first matrix. It can be written as a column vector 
times a row vector: 


111 
111 
a (1,1, 1, 1). 
it 


i os 
| 
i os 


What are the eigenvectors? Well, (1, 1, 1, 1)’ is an eigenvector, with the eigenvalue 4. 
Every vector orthogonal to (1, 1, 1, 1)’, on the other hand, has the eigenvalue 0. Thus, 
once J is subtracted, we get the eigenvalues 3 and —1, as asserted. 

Thus, the only solution to the above linear system is 


Ak = Q = An = Ay = O. 


Thus, our original polynomial is identically zero: 


v 
lil 
S 


This means that our original 56 degrees of freedom are indeed independent of each 
other, as asserted. This is the key for designing a basis function. 


9.4.2 Smooth Piecewise-Polynomial Function 


Let t; and ft) be two neighbor tetrahedra that share a joint edge: f; Nt2. Let p; and p2 be 
two different polynomials of degree five (or less), defined in t, and ft, respectively. 
Assume also that, in their joint edge t; M t2, both p,; and p> share the same 22 
degrees of freedom: ten partial derivatives at each endpoint, and two nontangential 
derivatives at the midpoint. As discussed above, p; — p2 and V p; — V p2 must then 
vanish throughout the entire edge. 

Now, in the union t; U ty, define a new piecewise-polynomial function: 


_jpit,y,z) if Gy zen 
= ee eee if (x, y,Z) Eh. 


How do we know that uv is smooth? Well, in t; M t, uv is continuous: it is the same 
from both sides. Furthermore, Vu is continuous as well: it is the same from both 
sides as well. These properties will be useful later. 
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9.4.3 Continuous Piecewise-Polynomial Function 


Assume now that t; and f share not only a mere edge but also a complete face: t) Nf2 
is now a face, not just an edge. Moreover, assume now that both p; and p2 share 
the same 36 degrees of freedom in f; M f2: ten at each vertex, and two more at each 
edge midpoint. As discussed above, p; — p2 must then vanish throughout the entire 
face. Thus, u is continuous throughout the entire face t) ) fz: it is the same from both 
sides. This property will be useful later: it will help design a piecewise-polynomial 
basis function. 


9.5 Basis Functions 


9.5.1 Side-Midpoint Basis Function 


We are now ready to design our first basis function. For a start, we define it in t as a 
polynomial of degree five. Later on, we’ll extend it continuously outside f as well. 
Consider, for example, the side midpoint 


k+l+m 
3 . 


= 
lil 


Recall that, at w, the barycentric coordinates are 
1 
ame ak ee and An = 0. 


Assume that 
OAn 


Oz 


#0. 


After all, in Sect.9.1.4, we’ve already assumed that this partial derivative is far away 
from zero. This way, at w, the degree of freedom is just the z-partial derivative. Now, 
let’s define the corresponding basis function wy in f¢: 


1 
Vw = aAKALAmAn (5 = 3) ’ 


where a is a constant scalar, to be specified later. 
Why is this a good candidate for a basis function? Because it has just one nonzero 
degree of freedom. Indeed, 
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at every corner in /, there is a triple product of vanishing barycentric coordinates. 
For this reason, the partial derivatives of order 0, 1, and 2 vanish there, as required. 
Furthermore, at every edge midpoint, there are two vanishing factors, so the degrees 
of freedom (the nontangential derivatives) vanish there as well. 

Moreover, at every side midpoint but w, there are two vanishing factors, so the 
degree of freedom (the nontangential derivative) vanishes there as well. 

So, our only task is to pick a cleverly, to make sure that the final degree of freedom 
(the nontangential derivative at w) is correct. 


So, what should a be? Pick a to make the z-partial derivative equal to 1 at w. How 
to do this? As did in Sect. 9.4.1, use the chain rule. This way, at w, 


(1D) = Vidw 
ny LOA 
~view (Gi) 
=v (arrrads (2=24)) (2) 
= AXA Am ViAn (5 n & *) 


=s 
= 3-400, 0, 0, 1) & *). 


More explicitly, define 


This way, only at w is the degree of freedom equal to 1. All the rest vanish, as 
required. Thus, wy is our first basis function. 

In practice, we may have not only one tetrahedron ¢ but also a complete mesh. In 
particular, t may have a neighbor tetrahedron from the other side of A(k, 1, m). How 
to define %y, there? Just use the same approach, and define wy as a different polyno- 
mial there. Still, at the joint side, both definitions agree with each other (Sect. 9.4.3). 
This way, Ww is indeed a continuous piecewise-polynomial function, as required. 

In the rest of the mesh, on the other hand, wy is defined as zero. This way, ww is 
indeed a proper basis function: continuous and piecewise-polynomial, with just one 
nonzero degree of freedom: at w only. 

The above could be done not only at w but also at any other side midpoint. Next, 
we define yet another kind of basis function. 
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9.5.2 Edge-Midpoint Basis Function 


Consider now the edge midpoint 


k+l 

h= aes 
2 
Clearly, at h, the barycentric coordinates are 


1 
M=A= 5 


Assume that, in the Jacobian 0\/Od, the 2 x 2 lower left block is nonsingular: 


det (“Ge Om, ~) #0 
Ox, y) 


After all, in Sect. 9.1.4, we’ve already assumed that this determinant is far away from 
zero. This way, at h, the degrees of freedom are just the x- and y-partial derivatives. 
Let’s start with the x-partial derivative. Let’s introduce the corresponding basis 
function, Wn,1: 
Whit = Aeae (QmAm + QnAn) ; 


where Q and @, are constant scalars, to be specified later. 
Why is this a good candidate for a basis function? Because it has just one nonzero 
degree of freedom. Indeed, 


e at every corner of f, there is here a product of at least three vanishing barycentric 
coordinates. For this reason, the partial derivatives of order 0, 1, and 2 vanish there, 
as required. 

e Furthermore, at every edge midpoint but h, the quadratic factor rn or rx vanishes, 
so the nontangential derivatives vanish there as well. 

e Moreover, at every side midpoint across from h, the quadratic factor xe or x 
vanishes, so the nontangential derivative vanishes there as well. 

e Later on, we'll also make sure that the degrees of freedom vanish at those side 
midpoints nearby h. 

e Before doing this, we have more urgent business: to make sure that the degrees of 
freedom are correct at h itself. For this purpose, pick the above a’s cleverly. 


So, what should aj, and a, be? Pick them to make the x-partial derivative equal to 
1, and the y-partial derivative equal to 0 at h. In other words, at h, 
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(1,0, -) = Viva 
On 
= Vi —— 
wn (5a) 
ON 
= Vi (02 (adn 4 Onda sa) 
at KAY (@ +a )) (Ga 


= 2N2V5 (OmAm + OpAn) (53) 


an 
= NA (amV\Am + nV} An) (53) 


= 2-4 (am (0, 0, 1, 0) + on (0, 0, 0, 1)) (33) 


od 
= 2-4 (0, 0, Om, Qn) (5a) . 


These are two linear equations in two unknowns: Qm and ay. What is the coefficient 
matrix? It is a familiar block: the 2 x 2 lower left block in the original Jacobian 
OX/Od. More precisely, we actually look at the transpose system, so we actually 
look at the transpose block. Anyway, it has the same determinant: nonzero. So, it is 
nonsingular, as required. Therefore, am and @, can be solved for uniquely. 

So, at h, the degrees of freedom are correct. What about the rest of the degrees 
of freedom in t? Well, From Sect.9.2.4, most of them already vanish, as required. 
Only at nearby side midpoints may they still be nonzero. How to fix this? 

Consider, for example, the side midpoint w, discussed in the previous section. 
Fortunately, we already have a basis function: Ww in t. Let’s go ahead and subtract a 
multiple of it from wp; in f: 


na — Una — (Vn,1), Ow)dw. 


Once this substitution is made, we have 


(whn,1), Cw) = 0, 


as required. Fortunately, this substitution doesn’t spoil the rest of the degrees of 
freedom, which remain correct. 

The same kind of subtraction can also be made for the other nearby side midpoint: 
(k+1+ n)/3. In ¢, subtract 


Vn <— Yn — (%n,1) ( 


A 3 ) Wk-+14-n)/3- 
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Once this substitution is made, we also have 


as required. Fortunately, this substitution doesn’t spoil any other degree of freedom. 
Thus, in its final form, w~,; makes a proper basis function in ¢. 

So far, we’ve defined 7%,,; in ¢ only. It is now time to extend it to the entire 
mesh as well. First, what about those neighbor tetrahedra that share (k, 1) as their 
joint edge? Repeat the same procedure there as well. This way, in each edge-sharing 
tetrahedron, 7»,; is defined as a different polynomial of degree five. Still, across 
(k, 1), Wp.1 remains smooth (Sect. 9.4.2). 

In the rest of the tetrahedra in the mesh, which don’t use (k, I) as an edge, wp, 
is defined as zero. This way, Wn,1 is indeed a proper basis function throughout the 
entire mesh. 

So far, we’ve focused on the x-partial derivative at h. Now, let’s consider the 
y-partial derivative. For this purpose, in the beginning of the above development, 
just replace (1, 0, -) by (0, 1, -). This produces a new function 7,2, corresponding 
to the y-partial derivative at h, as required. 

The same could be done not only at h but also at any other edge midpoint. Next, 
we move on to yet another kind of basis function: corner basis function. 


9.5.3 Hessian-Related Corner Basis Function 


Consider the corner n ¢€ f. Clearly, at n, the barycentric coordinates are 
Ak = A = Am = 0 and Anil 


Let’s define the basis function corresponding to the xx-partial derivative at n: 


Wns — ps ~~ a4 AiAj- 


ije{k.1.m} 


Why is this a good candidate for a basis function? Because it has just one nonzero 
degree of freedom. Indeed, 


e thanks to the cubic factor AS Wn,5 has vanishing degrees of freedom at k, 1, and m, 
as required. 

e This is also true at the edge and side midpoints, at least those that lie across from n. 

e Later on, we'll make sure that this is also true at those that lie nearby n. 

e Before doing this, we have more urgent business: to make sure that the degrees of 
freedom at n are correct. For this purpose, we must pick the a’s cleverly. 
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Thanks to symmetry, we may assume here that 
j= A%,ir 


so we’re actually looking for six unknown a’s. To find them, solve the following six 
equations at n: 


100 
000 | =VV'uns 


000 
O\' an 
= (aa) ¥%%9 (5a) 


an\! , an 
-(5) V\Vi Mn oa a4 jAiAj (5a) 


i,je{k,1,m} 


ary" an 
=(F) (2% DO amavis] (33) 
ije(k.L.m} 
; Akk Akl) Am 0 
_ (A Mk U1 Am 0] (AA 
~~ \ dd Qm,k Om,1 Om,m 0 od 
0 O 0 0 


= 2 (20er tw kk OM. OK, (20 A Aw) 


Ok O11 lm 
O(x, y, 2) ee ee O(x, y, 2) 


More explicitly, define 


Qk,k Ok, k,m 1 (00x, Xt, Am) - {100 A Ox, Mm) -1 
ak 1 Am | = 000 ——— ; 
2 O(x, y, 2) 000 O(x, y, z) 


QOm,k Om] Cm,m 


This way, Wn,s has the correct degrees of freedom at n as well. 

Still, this is not good enough. To become a proper basis function, this function 
must now be modified. For each nearby side midpoint, subtract a multiple of a 
basis function like that defined in Sect. 9.5.1 in t. Furthermore, for each nearby edge 
midpoint, subtract a multiple of two basis functions like those defined in Sect. 9.5.2 
inf. 

This defines our new basis function in t. How to extend it to those neighbor 
tetrahedra that share n as their joint corner? In each of these, just repeat the above 
procedure, and define wy,5 as a different polynomial of degree five. In the rest of 
the mesh, on the other hand, define it as zero. This extends w,,5 into a continuous 
piecewise-polynomial basis function in the entire mesh. 
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The same method could also be used to design five more basis functions, corre- 
sponding to the xy-, yy-, xz-, yz-, and zz-partial derivative at n. In fact, to define 
Wn.6> Wn.7> Vn.8> Vn.9 aNd Wn,19, Make just a small change in the above: just replace 


100 010 000 
O00] by | 100], 1] 010], 
000 000 000 
001 000 000 
OOO], | OO1], or | 000], 
100 010 001 


respectively. 


9.5.4 Gradient-Related Corner Basis Function 


Likewise, let’s go ahead and design a new basis function, corresponding to the x- 
partial derivative at n: 


Wn = 3 (amAq + Ar + OmAm) « 
How to find the unknowns ax, ay, and Qm? Solve three linear equations at n: 
(1,0, 0) = V'en,2 
= (Vien.2) (5a) 


— Vi 03 (AKA + AL + Omni) (53) 


Or 
— bes (aK V\Ak + aV\Ai + Om Vm) (33) 


OX 
= Aj (ax(1, 0, 0, 0) + a1(0, 1,0, 0) + am(0, 0, 1, 0)) ( ) 


od 
Or 
= (Qk, Q, Am, 0) (5a) 


O (Ak; Mb =) 


= (Qk, QQ), Qm) ( Ox y z) 


More explicitly, define 


OK Cay : 
(a4 > 
a A(x, y, 2) 0 
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Still, this is not good enough. To become a proper basis function, this function must 
now be modified. For each nearby side midpoint, subtract a multiple of the basis 
function defined in Sect.9.5.1 in t. Furthermore, for each nearby edge midpoint, 
subtract a multiple of those two basis functions defined in Sect.9.5.2 in t. Finally, 
subtract a multiple of those six basis functions defined in Sect. 9.5.3: 


Un2 <— Yn2— (Yn2),, Myns 
Wn2 <— Wn,2 — (Yn.2 a (n)Wn,6 


(vn, 
(Ym2),., 

Wa2 <— Un2 — (%n2),, Mbn7 
(Un, 
(Un, 
(Wn, 


) 
) 
ie 

Wa,2 <— Yn,2 — (%n,2),., (ns 
) 
) 


xz 


Wn,2 <— Wn2 — Wn,2 - (n)wn,9 


yZ 
and finally Wn,2 —- Wn,2 ~~ Vn 2) 27 (n)Wn,10 
in t. As before, wy,2 is now extended into a continuous piecewise-polynomial basis 
function in the entire mesh. 

The same approach can also be used to design two more basis functions, corre- 
sponding to the y- and z-partial derivative at n. In fact, to define wy,3 and qy.4, just 
replace (1, 0, 0)’ above by (0, 1, 0)' or (0, 0, 1)‘, respectively. 


9.5.5 Corner Basis Function 


Finally, let’s define the basis function corresponding to the function itself at n: 
Wn,1 = Da 


Still, this is not good enough. To become a proper basis function, this function 
must now be modified. As in Sect.9.5.2, for each nearby side midpoint, subtract a 
multiple of a side-midpoint basis function in f. Furthermore, for each nearby edge 
midpoint, subtract a multiple of two edge-midpoint basis functions in t. Moreover, as 
in Sect. 9.5.4, subtract a multiple of six Hessian-related basis functions in f. Finally, 
subtract a multiple of three gradient-related basis functions: 


Wa <— Ya — (vn.1), (n)Yn.2 


Unt = Unt (vn.1), (n)Yn3 
and finally ni <— Ua — (Yn), Mvn4 


int. As before, Wp,; should now be extended into a continuous piecewise-polynomial 
basis function in the entire mesh. 


9.5, 
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This completes the design of ten new basis functions for n. The same can now be 


done for the other corners as well. 


9.6 Exercises 


10. 


. Show that the side-midpoint basis function defined in Sect.9.5.1 is indeed a 


proper basis function: it has only one nonzero degree of freedom. 


. Show that, once extended to the entire mesh, it makes a smooth basis function: 


continuous, piecewise-polynomial, and with a continuous gradient across edges. 


. Show that the edge-midpoint basis function defined in Sect.9.5.2 is indeed a 


proper basis function: it has only one nonzero degree of freedom. 


. Show that, once extended to the entire mesh, it makes a smooth basis function: 


continuous, piecewise-polynomial, and with a continuous gradient across edges. 


. Show that the Hessian-related corner basis function defined in Sect.9.5.3 is 


indeed a proper basis function: it has only one nonzero degree of freedom. 


. Show that, once extended to the entire mesh, it makes a smooth basis function: 


continuous, piecewise-polynomial, and with a continuous gradient across edges. 


. Show that the gradient-related corner basis function defined in Sect.9.5.4 is 


indeed a proper basis function: it has only one nonzero degree of freedom. 


. Show that, once extended to the entire mesh, it makes a smooth basis function: 


continuous, piecewise-polynomial, and with a continuous gradient across edges. 


. Show that the corner basis function defined in Sect. 9.5.5 is indeed a proper basis 


function: it has only one nonzero degree of freedom. 
Show that, once extended to the entire mesh, it makes a smooth basis function: 
continuous, piecewise-polynomial, and with a continuous gradient across edges. 


Part IV 
Finite Elements in 3-D 


To define useful basis functions, one must first have a proper mesh. Consider a three- 
dimensional domain, convex or nonconvex. To approximate it well, design a mesh 
of disjoint (nonoverlapping) tetrahedra. In numerical analysis, these are called finite 
elements [9, 69]. 

In the mesh, we have nodes: the corners of the tetrahedra. The node is the most 
elementary ingredient in the mesh. An individual node may serve as a corner in a 
few tetrahedra. 

If they belong to the same tetrahedron, then the nodes must be connected to each 
other by an edge. An edge is often shared by a few adjacent tetrahedra. There are 
two kinds of edges: a boundary edge could be shared by two tetrahedra. An inner 
edge, on the other hand, must be shared by more tetrahedra. 

Each tetrahedron is bounded by four triangles: its sides or faces. In the mesh, 
there are two kinds of faces: an inner face is shared by two adjacent tetrahedra. A 
boundary face, on the other hand, belongs to one tetrahedron only. 

How to construct the mesh? Start from a coarse mesh that approximates the domain 
poorly, and improve it step by step. For this purpose, refine: split coarse tetrahedra. 
At the same time, introduce new (small) tetrahedra next to the convex parts of the 
boundary, to improve the approximation from the inside. This procedure may then 
repeat time and again iteratively, producing finer and finer meshes at higher and 
higher levels. 

This makes a multilevel hierarchy of finer and finer meshes, approximating the 
original domain better and better. In the end, at the top level, those tetrahedra that 
exceed the domain may drop from the final mesh. This completes the automatic 
algorithm to approximate the original domain well. 

The mesh should be as regular as possible: the tetrahedra should be thick and 
nondegenerate. Furthermore, the mesh should be as convex as possible. Only at the 
top level may the mesh become concave again. A few tricks are introduced to have 
these properties. 

To verify accuracy, numerical integration is then carried out on the fine mesh. For 
this purpose, we use a simple example, for which the analytic integral is well-known 
in advance. The numerical integral is then subtracted from the analytic integral. This 
is the error: it turns out to be very small in magnitude. Furthermore, our regularity 
estimates show that the mesh is rather regular, as required. 
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Once the basis functions are well-defined in the fine mesh, they can be used to 
approximate a given function, defined in the original domain. This is indeed the spline 
problem [5, 11, 17, 28, 41, 51, 58, 75, 76]: design a smooth piecewise-polynomial 
function to “tie” (or match) the original values of the function at the mesh nodes. 

The spline problem could also be formulated as follows: consider a discrete grid 
function, defined at the mesh nodes only. Extend it into a complete spline: a smooth 
piecewise-polynomial function, defined not only at the nodes but also in between. 
The solution must be optimal in terms of minimum “energy.” This is indeed the 
smoothest solution possible. 

Our (regular) finite-element mesh could be used not only in the spline problem 
but also in many other practical problem. Later on, we’ll see interesting applications 
in modern physics and chemistry. 


Chapter 10 ®) 
Automatic Mesh Generation ets 


Consider a complicated domain in three spatial dimensions. How to store it on the 
computer? For this purpose, it must be discretized: approximated by a discrete mesh, 
ready to be used in practical algorithms. 

To approximate the domain well, best use a mesh of tetrahedra. This way, the 
tetrahedra may take different shapes and sizes. Next to the curved boundary, many 
small tetrahedra should be used to approximate the boundary well. Next to the flat 
part of the boundary, on the other hand, a few big tetrahedra may be sufficient. This 
is indeed local refinement: small tetrahedra should be used only where absolutely 
necessary. 

The automatic refinement algorithm starts from a coarse mesh that approximates 
the domain rather poorly. Then, this mesh refines time and again, producing finer and 
finer meshes that approximate the domain better and better. Indeed, in a finer mesh, 
small tetrahedra may be added next to the curved boundary to approximate it better 
from the inside. This produces a multilevel hierarchy of more and more accurate 
meshes. 

At the intermediate levels, the mesh is often still convex. This is good enough for 
a convex domain, but not for a more complicated, nonconvex domain. Fortunately, at 
the top level, this gets fixed: those tetrahedra that exceed the domain too much drop. 
This way, the finest mesh at the top level gets nonconvex and ready to approximate 
the original nonconvex domain. 
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10.1 The Refinement Step 


10.1.1 Iterative Multilevel Refinement 


What is multilevel refinement? It uses a hierarchy of finer and finer meshes to approx- 
imate the original domain better and better [45, 47, 61]. 

At the bottom level, one may place a rather poor mesh, containing only a few big 
tetrahedra. Don’t worry: the initial coarse mesh will soon refine and improve. This 
is indeed the refinement step, producing the next finer mesh in the next higher level. 

How does the refinement step work? Well, each coarse tetrahedron splits into two 
subtetrahedra. How is this done? Well, a coarse edge is divided into two subedges. 
For this purpose, its original midpoint is connected to those two corners that lie across 
from it. This produces two new subtetrahedra. In the finer mesh, they are going to 
replace the original coarse tetrahedron, providing a better resolution. 


10.1.2 Conformity 


What happens to those neighbor tetrahedra that share the same coarse edge? To 
preserve conformity, they must split in the same way as well. This way, the above 
midpoint is connected to those corners that lie across from it not only in the original 
tetrahedron but also in each adjacent (edge-sharing) tetrahedron. 

So far, we’ve been busy splitting tetrahedra. This was done in two possible ways. 
A coarse tetrahedron may split out of its own initiative, to refine. Still, this is not the 
only way: a tetrahedron may also be forced to split not to refine but only to preserve 
conformity and fit to a neighbor tetrahedron that has already refined. 

Still, the refinement step may not only split existing tetrahedra but also introduce 
new ones to improve the approximation next to the convex part of the boundary. 
This is called boundary refinement. This completes the refinement step, producing 
the next finer mesh at the next higher level. This mesh is now ready for the next 
refinement step, and so on. 

The boundary of the original domain may contain two parts: the convex part, 
and the concave (or nonconvex) part. The refinement step distinguishes between the 
two. At the convex part, a few new (small) tetrahedra are introduced to approximate 
the curved boundary better from the inside. At the concave part, on the other hand, 
no new tetrahedron is introduced. On the contrary: those tetrahedra that exceed the 
domain too much even drop in the end. This way, the finest mesh at the top level will 
get accurate again, even for a complicated nonconvex domain. 
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10.1.3 Regular Mesh 


What is a regular mesh? Well, in a regular mesh, the tetrahedra are thick and non- 
degenerate [7, 39]. As discussed in Chap.9, Sect.9.1.1, this is a most important 
property. 

Fortunately, at the bottom level, one can often pick a rather regular initial mesh. 
How to make sure that the finer mesh in the next higher level remains regular as 
well? 


10.1.4 How to Preserve Regularity? 


Well, here is the trick. Just before the refinement step, order the coarse tetrahedra 
one by one by maximal edge: a tetrahedron with a longer maximal edge before a 
tetrahedron with a shorter maximal edge. In their new order, scan the tetrahedra one 
by one: refine a tetrahedron only if it is indeed a coarse tetrahedron. Otherwise, leave 
it. This way, each coarse tetrahedron refines only once: only its maximal edge splits 
into two subedges. This way, the original coarse tetrahedron splits into two new 
subtetrahedra, which can no longer split any more in this refinement step. 

Or can’t they? Well, there may still be one exception. Once a coarse tetrahedron 
splits (at its maximal edge, as above), all those neighbor tetrahedra that share this 
edge must split as well to preserve conformity (Sect. 10.1.2). In such a neighbor 
tetrahedron, however, this edge is not necessarily maximal: it could be submaximal. 

Could it really? After all, if the neighbor tetrahedron contained a longer edge, 
then it would have already been listed before and would have already split long ago. 
Thus, it can’t be a coarse tetrahedron: it could only be a subtetrahedron, obtained 
from an earlier split. 

This is not so good: splitting a subtetrahedron at its submaximal edge may produce 
a rather thin (irregular) subsubtetrahedron. As a result, regularity may decrease a 
little. Fortunately, this will soon get fixed: just before the next refinement step, the 
tetrahedra are going to be reordered once again in terms of maximal edge, so this 
problem should probably get fixed in the next refinement step, increasing regularity 
again. 


10.2 Approximating a 3-D Domain 


10.2.1 Implicit Domain 


The original three-dimensional domain @ C R* may be rather complicated: its 
boundary 02 may be curved, nonstandard, and irregular. In fact, 0 2 may be available 
only implicitly, in terms of a given real function F(x, y, z): 
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a2 = {(x,y,z) ER? | F(x, y,z) =0}. 


This defines 02 implicitly as the zero level set of F: the set of points at which F 
vanishes. 

Assume, for example, that we want 02 to be a sphere centered at (1/2, 1/2, 1/2). 
Assume also that we want this sphere to confine the unit cube: 


(0, 12 CR. 


In this case, F could be defined as 


, = aN 1\? eae 
woes) (3) 3) 4 


Indeed, with this definition, 92 is a sphere of radius /3/2 around (1/2, 1/2, 1/2). 
In fact, F is negative inside the sphere, positive outside it, and zero on the sphere 
itself. 

In practice, however, F is rarely available in its closed analytic form. More often, 
it is only available as a computer function: for every given x, y, and z, F(x, y, Z) 
can be calculated on the computer. Let’s see a more interesting example. 


10.2.2. Example: A Nonconvex Domain 


In the above example, {2 is convex: the interior of the sphere that confines the unit 
cube. Here, on the other hand, we consider a more complicated example, in which 
2 is no longer convex. 

Let Rt be the nonnegative part of the real axis: 


Rt={xeR|x>O0}CR. 


Assume that £2 lies in the nonnegative “octant” of the three-dimensional Cartesian 
space: 
Qc (Rt) = {@, y,2) eR? a Oy 0, Zor, 


This way, every point in §2 must have three nonnegative coordinates: x > 0, y > 0, 
and z > 0. 

Still, we are not done yet: we want @ to be much smaller than (Rt)?. For this pur- 
pose, let R; and Rz be some positive parameters to be specified later (0 < R; < Ro). 
Assume also that 2 lies in the sphere of radius R2, centered at (0, 0, 0). This way, 
every point in §2 must have a magnitude of R> or less. 

Finally, assume also that 2 lies outside of the sphere of radius R,. This way, every 
point in $2 must have a magnitude of R; or more. 
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In summary, {2 contains those points with magnitude between R; and Ro, and 
with three nonnegative coordinates: 


2Q={a@,yDeR | x20, y>0,220, Asx? t+yt+2< Rj}. 


This defines 2 in a closed analytic form: a nonconvex domain. 

It would be more interesting, though, to pretend that we don’t have this closed 
form available. After all, this is usually the case in practice. Thus, we should better 
train in using F’, not $2. For this purpose, how to define F? 

Well, from the original definition, the origin is not in §2. Thus, one could issue a 
ray from the origin towards (2. Clearly, the ray meets 02 at two points: it enters £2 
through a point of magnitude R; and leaves through a point of magnitude Ro. On the 
ray, it makes sense to define F as a parabola that vanishes at these two points, and 
has its unique minimum in between. This would indeed define F in (Rt)3. 

So far, we’ve “defined” F in (R+)? only. Still, this is not the end of it. After all, F 
must be defined outside {2 as well. In fact, it must be positive there: it must increase 
monotonically away from 2. 

To extend F to the rest of the Cartesian space, one must consider negative coor- 
dinates as well. Each negative coordinate should contribute its absolute value to F 
to increase the value of F linearly as the point leaves (R+)>. This way, F indeed 
increases monotonically away from £2, as required. 

So, we only need to specify the above parabolas explicitly. This will indeed make 
one factor in F: a = a(x, y, z). This way, both a and F' will indeed vanish on the 
round parts of 092, where the magnitude is either R; or Ro. Still, this is not good 
enough. After all, / must vanish on the flat sides of {2 as well. For this purpose, 
F must contain yet another factor: b = b(x, y, z). So, we can already view F as a 
product of two functions: F = ab. 

How should b look like? Well, each nonzero three-dimensional vector (x, y, z)' 
makes three cosines with the x-, y-, and z-axes: 


* y and 


Vetyte Sx py $2?’ Vx? + y? + 2? 


(Chap. 2, Sect.2.3.3). Consider the minimal cosine (in absolute value): 


min(|x|, |yl, |z|) 
Vx? +y?4 22 


b=b(x,y,z)= 


This way, b is positive throughout (R*)°, except at the planes x = 0, y = 0, and 
z = 0, where it vanishes. Thus, these planes are contained in the zero level set of b: 
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a ((R*)’) = {a y.2€R [we OS 02> 0, xyz 0} 


c {@, y,z) eR? | xyz=0} 
= {(x, y,z) ER? | b(x, y,z) = 0} U {(O, 0, 0)}. 


In particular, the zero level set of b contains the flat sides of §2, as required. 
Let us now go ahead and define a explicitly as well: 


a =a(x,y,2)=(Ri- Ve ty? +2) (R- VP +2). 


This way, in each ray issuing from the origin towards §2, a makes a parabola, as 
required. Each parabola vanishes at two points only: the point of magnitude Rj, and 
the point of magnitude R>. 

Fortunately, b is constant in the ray, so the product ab still makes a parabola in the 
ray, as required. Moreover, ab still has a unique minimum in between these points. 
For this reason, ab must also have a unique minimum in the entire domain §2. As a 
matter of fact, this minimum is obtained at 


1 R R 
eee Ee Gy le 1): 
V3 2 


We are now ready to define F in the entire Cartesian space: 


ab if (x,y,z)€Q2 

la| if (x,y,z) € (Rt) \@ 
F(max(x, 0), max(y, 0), max(z, 0)) 
— min(x, 0) — min(y, 0) — min(z, 0) if (x, y, z) ¢ (R*)’. 


F(x, y,2) = 


This way, outside (R*)?, F is defined recursively from its value at the nearest point 
in (R*)?. With this definition, F indeed increases linearly as either x or y or z gets 
more and more negative. 

This completes the definition of F’. Indeed, F is negative in the interior of 2, 
zero on the boundary 0{2, and positive outside of §2. Furthermore, F' increases 
monotonically away from S2, as required. 

We thus have a good example to model a realistic case. In fact, we can just pretend 
that 92 has never been disclosed to us explicitly, but only implicitly, in terms of F’. 
As a matter of fact, we can even pretend that F is available not analytically but 
only computationally: given x, y, and z, we have a computer program to calculate 
F(x, y, z) for us. As we’ll see below, this is enough to design a proper mesh to 
approximate 2. 
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10.2.3 How to Find a Boundary Point? 


As discussed above, to model a realistic case, we pretend that 0{2 is not available 
explicitly. What is available is the computer function F: for every given x, y, and z, 
F(x, y, z) can be calculated on the computer. Thus, 02 is only available implicitly, 
as the zero level set of F: 


(x,y,z)€02 if F(x, y,z)=0. 


Fortunately, this is good enough to find a new boundary point on 092. How to do 
this? Well, for this purpose, one must find some concrete coordinates x, y, and z, for 
which 

F(x, y,z) =0. 


Let a € R? be some initial point. Furthermore, let d € R? be some nonzero direction 
vector. The task is to find a boundary point of the form a + ad, for some unknown 
(nonnegative) scalar a. 

For this purpose, consider the arrow leading from a to a+d (Fig. 10.1). If F 
changes sign over the arrow: 


F(a)F(a+d) <0, 


then the required boundary point lies in between a and a + d, and can be found by 
iterative bisection: split the original arrow into two subarrows. If F' changes sign 
over the first subarrow: 


F(a)F (a+ 5) < 0, 


then the boundary point must lie on the first subarrow, in between a and a + d/2. In 
this case, the first subarrow is picked to substitute the original arrow: 


d< —. 


2 


If, on the other hand, F' changes sign over the second subarrow: 
OQ 


eat+d 


Fig. 10.1 The good case: the arrow leading froma toa + d indeed contains a boundary point in 02. 
In this case, F indeed changes sign over the arrow: F(a) < 0 < F(a+d)or F(a) >0> F(a+d) 
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Fig. 10.2 The bad case: the a2 
arrow leading from a to 

a+ d is too short, so the 

boundary 02 remains ahead 

of it. The arrow must first 

shift or stretch forward, until 

its head passes 02, as in the 

previous figure 


eat+d 


a 


d 
F(a+$) Fata <0 


then the boundary point must lie on the second subarrow, in between a + d/2 and 
a+ d. In this case, the second subarrow is picked: 


d d 
a<a+— andd< —. 
2 2 


This procedure repeats time and again, until d gets sufficiently short, so the required 
boundary point is found with a sufficient accuracy. 

Unfortunately, the situation is not always so benign. Some preparation work may 
be needed before iterative bisection can start. To study such a case, let us return to 
the original a and d. 

In Fig. 10.2, for example, the original arrow is too short. It must first shift (or 
stretch) ahead, until its head passes 02. Only then can iterative bisection start, as 
above. 


10.3 Approximating a Convex Boundary 


10.3.1 Boundary Refinement 


Thanks to the above method, we can now find a new boundary point. This can now 
be used in the refinement step. This way, the mesh refines not only in the interior 
of the domain but also next to the curved boundary. For this purpose, a few new 
(small) tetrahedra are introduced next to the convex part of the boundary to improve 
the approximation there from the inside. 

For instance, consider a boundary edge that lies next to the convex part of the 
boundary, with both its endpoints on the boundary. Such an edge must then be shared 
by two boundary triangles that lie next to the boundary as well, with all vertices on 
the boundary. In fact, each boundary triangle may serve as the face in one tetrahedron 
only. 
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Since the boundary is locally convex there, the boundary edge (and the boundary 
triangles) should lie (mostly) inside the domain. In the refinement step, the coarse 
boundary edge may split. In this case, a normal vector issues from its midpoint 
towards the boundary. This normal vector may make a good direction vector that 
points towards a new boundary point, as in Sect. 10.2.3. 

This new boundary point is then connected to five points: three on the original 
boundary edge (two endpoints and one midpoint), and two off it (one vertex in each 
boundary triangle). This adds four new tetrahedra to improve the approximation next 
to the locally convex boundary from the inside. 


10.3.2. Boundary Edge and Triangle 


What is a boundary edge? So far, we’ve assumed that a boundary edge must have 
both endpoints on 02. But what happens when §2 is nonconvex? In this case, the 
initial coarse mesh might be quite different from (2: it might exceed it quite a bit 
(Fig. 10.3). 

In this case, a standard boundary edge (with both endpoints on 0{2) may not do: 
its midpoint may lie outside (2. After all, £2 may be locally concave there. 

In this case, once this edge splits into two subedges, there is no boundary refine- 
ment. But what about the subedge, split in the next refinement step? Well, it is no 
longer a boundary edge: it has just one endpoint on 092, and one endpoint off 02. For 
this reason, in the next refinement step too, the subedge may split, but with no bound- 
ary refinement. This is indeed unfortunate: the domain will never be approximated 
accurately there. 

To fix this, we must redefine a boundary edge in terms of the current mesh M, 
not the original domain 2. A boundary edge must have both endpoints on 0M, not 
necessarily on 092. 

How could this help? Well, consider the above situation once again (Fig. 10.4). 
In the first refinement step, there is no change: the original boundary edge is so 
coarse that its midpoint lies outside the nonconvex domain, so there is no boundary 
refinement there. Fortunately, its subedge is a boundary edge of M, although not of 
92. As such, since its own midpoint lies well inside §2, it may indeed split there in 
the next refinement step, this time with boundary refinement, as required. 

This way, the fine mesh will no longer be convex. Later on, we’ll see that this 
may be risky: in yet finer meshes, tetrahedra may overlap, making a complete mess. 
Fortunately, here M is nonconvex outside the domain only, so there should be no 
risk. 

Thus, from now on, a boundary edge will mean a boundary edge of M, not 82, anda 
boundary triangle will mean a boundary triangle of M as well. For this purpose, we’ ll 
define a new mechanism to detect such an edge, based on M only, and independent 
of 2 or F. 
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Fig. 10.3. A boundary edge 

with both its endpoints on 

02. Unfortunately, its 

midpoint may still lie outside 

the nonconvex domain, so no 

boundary refinement is 

carried out there. 

Furthermore, the subedge is 

no longer a boundary edge: it 

has an endpoint off 092. 

Therefore, no boundary 

refinement will take place t \ j 
ever ; 


10.3.3 How to Filla Valley? 


In Sect. 10.1.4, we’ve seen that only the maximal edge should split. Or should it? 
Well, sometimes a submaximal edge should split instead to make the fine mesh more 
convex. After all, in Fig. 10.4, we only see a two-dimensional projection. In reality, 
on the other hand, the three-dimensional mesh may suffer from a local concavity: a 
“valley.” 

Consider, for example, the coarse mesh M, as viewed from above (Fig. 10.5). 
Assume that M is flat from above, so this is indeed a real view. Which is the maxi- 
mal edge? Clearly, the oblique one. This is indeed the edge that should split in the 
refinement step. 

Assume that 02 lies above M, closer to your eyes. In this case, there is also a 
boundary refinement: from the edge midpoint, a normal vector issues towards your 
eyes to meet the boundary above it, next to your eye. This is indicated by the ‘©’ in 
Fig. 10.5. This forms a little pyramid, made of four new tetrahedra, to approximate 
the boundary above it better. 
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Fig. 10.4 A boundary edge 
of M, although not of 2. In 
this sense, the left subedge is 
a legitimate boundary edge, 
with both endpoints on 0M, 
although not on 092. Its own 
midpoint lies well inside £2, 
so boundary refinement will 
indeed take place there in the 
next refinement step. 
Although the mesh gets 
slightly nonconvex, this 
should produce no 
overlapping tetrahedra 


Fig. 10.5 The coarse mesh: 
a view from above. In the 
refinement step, the oblique 
edge splits, and a normal 
vector issues from its 
midpoint towards your eyes 
to hit the boundary above it 
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Unfortunately, this also produces a concave valley in between the new pyramids 
(Fig. 10.6). How to fill it? Well, in the next refinement step, in each pyramid, split 
not the maximal edge but rather the vertical edge along the valley, in between the 
pyramids (Fig. 10.7). This way, a normal vector will issue from the middle of the 
valley towards your eyes to help fill the valley with four new tetrahedra, next to the 


boundary above it. 


Thus, the original order in Sect. 10.1.4 must change. First, list those tetrahedra 
near a valley: those near a “deep” valley before others. This way, a tetrahedron near 
a deep valley splits earlier, as required, and the valley gets filled sooner, even though 
the edge along it is submaximal. Later on, we’ll state more clearly what “deep” 


means. 
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Fig. 10.6 The next finer 
mesh: two pyramids, with a 
concave valley in between 


Fig. 10.7 The next 
refinement step: the vertical 
edge along the valley, 
although submaximal, splits, 
and a normal vector issues 
from its midpoint ‘©’ 
towards your eyes to hit the 
boundary above it, and fill 
the valley with four new 
tetrahedra 
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concave valley 


concave valley 


yramid 


yramid 


After these, list those tetrahedra that are near no valley. These are ordered as 
before: those with a long edge before others. After all, they should refine at their 


maximal edge, as before. 


Finally, list those tetrahedra that exceed the (nonconvex) domain too much, regard- 
less of the length of their edges. After all, they should have actually been dropped, 
so they have little business to refine. Later on, we’ll state more clearly what “too 


much” mean. 


10.3.4 How to Find a Boundary Edge? 


In Sect. 10.3.2, we’ve explained why we’re interested in a boundary edge of M, not of 
92. How to find such an edge? More precisely, given an edge, how to check whether 
it is indeed a boundary edge, and find the boundary triangles that share it? After all, 
the endpoints of the edge must now lie on 0M, which is not available numerically! 
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Well, for this purpose, let t; C M be some tetrahedron in the mesh, and let e C t 
be an edge in it. Then, e is shared by two sides 51, 52 C t: 


e=s,M So. 
To check whether e is a boundary edge or not, let us check whether it belongs to any 
boundary triangle. For this purpose, let us search for a neighbor tetrahedron f that 
shares 5 as its joint face: 

S29 =t) Nh. 
Now, f2 must have another face, s3, that also shares e as its joint edge: 


€= 8,152.1 $3. 


We can now apply the same procedure iteratively to fz rather than f; to find yet another 
triangle, s4 that shares e as well, and so on. This produces a list of triangles 


51,52, 53, oe Sn 
that share e as their joint edge: 
€=51N52N53N ++ NS, = MNS. 


Now, if 
Sn = S1, 


then e can’t be a boundary edge: it is surrounded by tetrahedra from all directions, 
so it must be away from 0M. If, on the other hand, 


Sn FS, 
then s, must be a boundary triangle, and e must indeed be a boundary edge. 


In this case, how to find the second boundary triangle? Apply the same procedure 
once again, only this time interchange the roles of s; and s2. 


10.3.5 Locally Convex Boundary: Gram—Schmidt Process 


Let 
t=(k,l,m,n)CM 


be some tetrahedron in the mesh, vertexed at corners k, I, m, and n (Fig. 9.1). Let 


e=(m,n)ct 
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Fig. 10.8 Projection onto en 
the plane perpendicular to 
the boundary edge 

e = (m,n). The direction 
vector d points from 

a= (m-+n)/2 towards 022, u v 
as required 


be an edge in t. Assume that, in the refinement step, e should split at its midpoint: 


m+n 
ri 


FS) 
lil 


We must then know: is e a boundary edge? After all, if it is, then boundary refinement 
may be required at a (Sect. 10.3.1). Fortunately, we already know how to find out the 
answer. (Sect. 10.3.4). 

Thus, assume that e is indeed a boundary edge: 


ecoamMm. 
In this case, we also get a bonus: two boundary triangles: 
A(m,n,u) and A(m,n, v) Cc 022. 


These will be useful below. 
Furthermore, assume also that the domain is locally convex at a: 


F(a) <0, soaeQ\d2. 


In this case, it makes sense to fill the gap between e and 02 with four new tetrahedra. 
For this purpose, issue a normal vector d from a towards 02 to hit 02 at the new 
boundary point w € 092 (Sect. 10.2.3). Then, connect w to five points: m, n, a, u, 
and v (Fig. 10.8). This indeed produces four new tetrahedra, to approximate the 
boundary better at a from the inside, as required. 
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Now, how to define the direction vector d? Well, it should point in between u 
and v, towards w € 082. For this purpose, let 


n—-m 
e = —— 
|| — m|2 


be the unit vector parallel to e. Furthermore, define the differences 
0 =u-—a and p=v—-a. 
Now, project both o and p onto the plane perpendicular to e: 


o0<o0-(0,e)e 


P< Pp- (peje. 
Indeed, after these substitutions, 
(0, e) = (p, e) = 0. 
This is actually a Gram—Schmidt process (Chap. 2, Sect. 2.3.2). Next, normalize both 


o and p: 


lloll2 


a 
IIPll2 


This produces the picture in Fig. 10.8. Now, let d be the vector product 


d = (p— 0) xe. 
Next, normalize d as well: 
d 
d < —_. 
I|d||2 


Is d a good direction vector? Well, in Fig. 10.8, assume that e points into the page, 
away from you. Thanks to the right-hand rule, d indeed points away from M, as 
required. (See exercises at the end of Chap. 2.) 

But what if the orientation is the other way around, and 


det ((k —a | e | 1—a)) < 0? 


In this case, d must reverse: 
d < —d. 


(See exercises at the end of Chap. 2.) 
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By now, in either case, d is a good direction vector: it indeed points away from 
M, towards 092, as required. 
We can now also tell what a “deep” valley is: if 


(0+ p, d) 


is large, then the valley along e is indeed deep. In this case, t should be listed early, 
and split early at a, filling the valley with four new tetrahedra: 


(a, w,m,u), (a, w,m,v), (a,w,n,u), and (a, w,n, v), 


as required. This helps approximate S2 better from the inside. 
Still, there is a condition. To use these four new tetrahedra, w must be away from 
a: 
Iw — all > 10-7 |/n — ml. 


Otherwise, these new tetrahedra are too thin and degenerate and should better be left 
out, and not added to M. 


10.4 Approximating a Nonconvex Domain 


10.4.1 Locally Concave Boundary 


Still, there is one more problem. So far, the domain has been approximated from the 
inside, at its locally) convex part only. At its locally concave (nonconvex) part, on 
the other hand, no boundary refinement has been carried out. After all, the midpoint 
of the boundary edge often lies outside the domain. So, the approximation is still 
poor. 

How to fix this? Well, a few strategies have been tested. One method is as follows. 
Recall that, in the refinement step, the original coarse tetrahedron 


t = (k,1, m,n) 
is replaced by two subtetrahedra: 
(k,1,m,a) and (k,1,a,n), 


where 
m+n 
a= 


2 


is the edge midpoint. Here, on the other hand, since a ¢ {2, it might make sense to 
replace it by the nearest point ¢ € 0S2. This way, in the refinement step, f is replaced 
by two new fine tetrahedra: 
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(k,l,m,c) and (k,l,c,n). 


Is this a good fix? Unfortunately not: it might produce overlapping tetrahedra in the 
next refinement step, and a complete mess. 

A better approach is as follows. Wait until the entire multilevel hierarchy is com- 
plete. Then, fix the top level only: from the finest mesh, drop those tetrahedra that 
exceed {2 too much in the sense of having 3-4 corners outside (2. A tetrahedron 
with only 1—2 corners outside §2, on the other hand, mustn’t drop. This way, at the 
top level, {2 gets approximated at its concave part as well: not from the inside, but 
from the outside. 


10.4.2. Convex Meshes 


Why fix only the top level, not the intermediate ones? Well, dropping tetrahedra from 
an intermediate mesh might spoil the special structure required to carry out the next 
refinement step. 

In fact, even in a nonconvex domain, the mesh should better remain as convex 
as possible for as long as possible. Dropping a tetrahedron, on the other hand, may 
produce a “hole” in the mesh, making it highly nonconvex too early. 

What would then happen in the next refinement step? Well, consider a boundary 
edge in such a hole. Recall that this is a boundary edge of the mesh, not necessarily of 
the domain (Sect. 10.3.2). The normal vector issuing from its midpoint (Sect. 10.3.5) 
may then hit the other bank of the hole, producing overlapping tetrahedra, and a 
complete mess. 

This is why no tetrahedron should drop from any intermediate mesh. This way, in 
most of the multilevel hierarchy, the meshes remain as convex as possible. Only from 
the finest mesh may some tetrahedra drop, leaving it as nonconvex as the original 
domain, as required. 


10.5 Exercises 


1. Consider the closed unit cube 
2=[0,1P={@,y,2¢R | O0<x,y,z<1}. 


2. Show that 2 is convex. 
3. Define the function 


Xx 


> 


F(x, y, Z) = max ( 


5) y 


344 


10 Automatic Mesh Generation 


. Show that F is negative in the interior of 2, zero on its boundary, positive outside 


it, and monotonically increasing away from it. 


. Conclude that 02 is indeed the zero level set of F’, as required. 
. Show that F is monotonically increasing on each ray issuing from the middle of 


Q at (1/2, 1/2, 1/2). 


. Write the unit cube as the union of six disjoint tetrahedra. 
. Make sure that this mesh is conformal. 

. Apply a refinement step to this mesh. 

. Make sure that the fine mesh is conformal as well. 


Assume now that {2 is the interior of the sphere that confines the unit cube. 
Show that (2 is convex. 


. Define F as in Sect. 10.2.1. 
. Show that F is negative in the interior of 2, zero on its boundary, positive outside 


it, and monotonically increasing away from it. 


. Conclude that 02 is indeed the zero level set of F’, as required. 
. Show that F is monotonically increasing on each ray issuing from the middle of 


Q at (1/2, 1/2, 1/2). 


. Now, let 2 be the nonconvex domain in Sect. 10.2.2. Show that §2 is indeed the 


intersection of the ball of radius R>, the outside of the ball of radius R,, and 
(Rt). 


. Show that 2 is indeed nonconvex. 

. What is the convex part of the boundary of £2? 

. What is the concave (nonconvex) part of the boundary of 2? 

. What is the flat part of the boundary of (2? 

. Show that the function F defined in Sect. 10.2.2 is indeed negative in the interior 


of 92, zero on 082, positive outside of (2, and monotonically increasing away 
from (2. 


. Conclude that 02 is indeed the zero level set of F’, as required. 
. Show that F has a unique minimum on each ray issuing from the origin towards 


2. 


. Show that F increases monotonically on each ray issuing from 0{2 away from 


2. 


. Show that F has a unique minimum in (2. 
. Find this minimum explicitly. 
. Consider now some mesh of tetrahedra. Show that a boundary triangle serves as 


a face in exactly one tetrahedron. 


. Show that a boundary edge is shared by exactly two faces: the boundary triangles. 
. Show that, if these boundary triangles serve as faces in one and the same tetra- 


hedron, then this is the only tetrahedron that uses the above boundary edge. 


. Show that the algorithm in Sect. 10.3.4 indeed tells whether a given edge is a 


boundary edge or not. 


. Show that, if this is indeed a boundary edge, then this algorithm also finds the 


boundary triangles that share it. 


. In this case, how can the second boundary triangle be found as well? 
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34. 


35. 


36. 


37. 


38. 


Show that the direction vector d defined in Sect. 10.3.5 (in its final form) indeed 
points from the midpoint a towards 02, in between the boundary triangles. 
Show that the four new tetrahedra that are added to the mesh at the end of 
Sect. 10.3.5 indeed improve the approximation at the convex part of the boundary 
from the inside. 

Show that the refinement step preserves conformity. 

Show that the dropping technique in Sect. 10.4.1 indeed improves the approxi- 
mation at the concave part of the boundary from the outside. 

Why should this dropping technique be applied to the finest mesh only? 


Chapter 11 ®) 
Mesh Regularity speek 


In Chap. 10, Sects. 10.1.3—10.1.4, we’ve already met the important concept of mesh 
regularity, and took it into account in the refinement step. Here, we continue to discuss 
it, and introduce a few reliable tests to estimate it. This way, we can make sure that 
our multilevel refinement is indeed robust: the meshes are not only more and more 
accurate but also fairly regular. 

Some regularity tests could be rather misleading and inadequate. Here, we high- 
light this problem, and avoid it. We are careful to use regularity tests that are genuine 
and adequate. 

Why are tetrahedra so suitable to serve as finite elements in our mesh? Because 
they are flexible: can take all sorts of shapes and sizes. This is the key for an efficient 
local refinement: use small tetrahedra only where absolutely necessary. Still, there is 
a price to pay: regularity must be compromised. After all, to approximate the curved 
boundary well, the tetrahedra must be a little thin. Still, thanks to our tricks, regularity 
decreases only moderately and linearly from level to level. This is not too bad: it is 
unavoidable and indeed worthwhile to compromise some regularity for the sake of 
high accuracy. 


11.1 Angle and Sine in 3-D 


11.1.1 Sine ina Tetrahedron 


How to measure mesh regularity? For this purpose, we must first ask: how to measure 
the regularity of one individual tetrahedron? Or, how to measure how thick it is? 
Consider the general tetrahedron 


t = (k,I,m,n), 
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vertexed at its distinct corners k, I, m, and n (Fig. 9.1). For example, if 


k = (0, 0, 0)’ 
l= C,0,0) 
m = (0, 1, 0)’ 
n= (0,0, 1)’, 


then ¢ is just the unit tetrahedron T in Fig. 8.5. 
Recall the 3 x 3 matrix 


S,=(—k | m—k | n—h), 


whose columns are the vectors leading from k to the three other corners. As discussed 
in Chap.9, Sect.9.1.1, S; is the Jacobian of the mapping that maps T onto f. For 
example, if t = T, then S, is just the 3 x 3 identity matrix: the Jacobian of the 
identity mapping. 

In a two-dimensional triangle, we already have the sine function, to help estimate 
the individual angles. Could the sine function be extended to a three-dimensional 
tetrahedron as well? After all, unlike the triangle, the tetrahedron has no angles in 
the usual sense! 

For this purpose, let’s define the “sine” of t at some corner, say k. This new sine 
will have a value between 0 and 1, to tell us how straight (or right-angled) ¢ is at k: 


| det(S;)| 
[1 — kl] - im — k]] - In — kl] 


sin(t, k) = 


In other words, take $,, normalize its columns, calculate the determinant, and take 
the absolute value. 

In the extreme case in which ¢ is degenerate, its sine is as small as 0. In the more 
optimistic case in which ¢ is as straight as the unit tetrahedron T, on the other hand, 
its sine is as large as 1. Thus, the sine indeed tells us how straight and right-angled t 
is at k. 


11.1.2. Minimal Angle 


To estimate sin(t, k), let us write it in terms of angles between edges or faces in f. 
This may help bound sin(t, k) from below, indicating regularity. 

Consider the face A(k, 1, m) C f. In this face, let a be the angle vertexed at k. In 
Fig. 11.1, for example, A(k, I, m) is horizontal, so a would be a horizontal angle as 
well. 

Furthermore, consider the edge (k, n). Consider its orthogonal projection onto 
the above face. This produces a new angle between (k,n) and this projection: /. 
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Fig. 11.1 The tetrahedron t: m 
a view from above. It sits on 
its horizontal base: y 

A(k, I, m). Its left edge 

(k, l) and its top corner n 

make a nonhorizontal face: 6 n 
A(k, 1, n). Between these 

two faces, there is a vertical 

angle: 6 


In Fig. 11.1, for example, 3 would be the vertical angle between (k, n) and the x-y 
plane. 

Consider now another face: A(k, I, n) C tf. In this face, let 7 be the angle vertexed 
at k. In Fig. 11.1, for example, + is a nonhorizontal angle. 

Finally, let 6 be the angle between the above faces (or between their normal 
vectors). Then, we have 


sin(t, k) = sin(q@) sin(@) = sin(q@) sin(y) sin(6). 

Now, how to make sure that ¢ is nondegenerate? Well, require that these angles are 
nonzero (have a positive sine). Furthermore, how to make sure that f is quite thick? 
Well, require that these angles are far from zero: their sine is bounded from below 
by a positive constant. This is indeed the minimal-angle criterion. 

Still, this is a rather geometrical criterion. Is there a more algebraic, easily calcu- 
lated criterion? 

Well, let’s try. Assume that ¢ is regular in the sense that 

sin(t,k) > C 


for some positive constant 0 < C < 1, independent of k. Then, we also have 


sin(a) > sin(a) sin(y) sin(6) = sin(t, k) > C 
sin(O) > sin(q@) sin(y) sin(d) = sin(t, k) => C, 


or 


qa > arcsin(C) 


6 > arcsin(C), 


where 0 <arcsin(C) < 7/2. In fact, even as C — 0, we still have 


IV 


arcsin(C) ~ C 
6 > arcsin(C) ~ C. 
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Furthermore, since C is independent of k, the same could be done for every angle in 
t, either between two distinct edges or between two distinct faces. All such angles 
are indeed bounded from below by arcsin(C) > 0. Thus, t is indeed thick enough. 

Is the reverse also true? Well, let’s try the other way around: assume now that all 
angles like a and 6 are bounded from below by a positive constant 0 < Cy < 7/2. 
Then, we have 


sin(t, k) = sin(q) sin(y) sin(d) > sin’ (C4) > 0. 


Furthermore, since this is true for any a, 7, and 6, k in the above estimate could also 
be replaced by I, m, or n. 

Do we have here two equivalent criteria to estimate regularity? Unfortunately not: 
as C4 — 0, the latter estimate is too weak, and gives us little information: 


sin(t,k) > sin?(Cy)~ C3} K Cy K 1. 


As discussed below, this kind of “equivalence” may be misleading and inadequate. 


11.1.3 Proportional Sine 


Unfortunately, sin(t, k) doesn’t tell us the whole story. After all, even a straight and 
right-angled tetrahedron may still be disproportionate and nonsymmetric: the edges 
issuing from k may still be different from each other in length. To account for this, 
let’s introduce the so-called proportional sine: 


min({|I — kj], |}m — kl], |] — kI)) 


Psine(t, k) = sin(t, k) : 
max({|I — kl], ||m — kl], |[n — k])) 


Still, this is not the end of it. To tell how straight and symmetric f is, we might want 
to look at it from the best point of view. So far, we’ve looked at it only from k. There 
might, however, be a better direction. This motivates the definition of the maximal 
proportional sine: 
maxPsine(t) = max Psine(t,q). 
qe{k,],m,n} 


For example, in terms of maximal proportional sine, the unit tetrahedron T has the 
maximal possible regularity: 1. 
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11.1.4 Minimal Sine 


The maximal proportional sine is an algebraic criterion, easy to calculate on the 
computer. Later on, we’ll see that it is actually equivalent to the minimal sine: 


minSine(t) = min _ sin(t,q). 
qé{k,1,m,n} 


In terms of minimal sine, the most regular tetrahedron is no longer T, but rather the 
even equilateral tetrahedron, whose edges have the same length. 

In Sect. 11.1.2, we have already seen that minimal sine is “equivalent” to minimal 
angle: a tetrahedron regular in one sense is also regular in the other sense. Still, this 
“equivalence” is inadequate and misleading. The minimal-angle criterion is much 
more robust and reliable (Fig. 11.2). Unfortunately, it is geometrical in nature, and 
not easy to calculate. Later on, we’ll introduce a new regularity estimate that is not 
only robust but also easy to calculate: ball ratio. 


11.2 Adequate Equivalence 


11.2.1 Equivalent Regularity Estimates 


How thick is t? In the above, we already gave three possible estimates: its minimal 
angle, minimal sine, or maximal proportional sine. Below, we’ll see that these esti- 
mates are not completely independent of each other. On the contrary, they may be 
related, or even equivalent to each other. In fact, minimal sine and maximal propor- 
tional sine are both weak, whereas minimal angle is strong and robust (Fig. 11.2). 

To see this, let us first introduce yet another (weak) regularity estimate— volume 
ratio: 

| det(S,)/6| 


maxEdge* (t) 


where maxEdge(f) is the maximal edge length in t. Let’s show that this estimate is 
not really new: it is actually equivalent to the maximal proportional sine: 


maximal proportional sine > constant - (volume ratio) 
(which is obvious) and 
volume ratio > constant - (maximal proportional sine) 


(which is not so obvious). 
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weak estimates: 


maximal 
proportional «——~> volume ratio minimal sine 
sine 
3 3 
robust estimates: ball ratio «~——*+ minimal angle 


Fig. 11.2 Strong versus weak regularity estimates. The weak estimates at the top are equivalent to 
each other, but inferior to the robust estimates at the bottom 


Let’s prove the “not so obvious” bit. For this purpose, consider the maximal edge 
in ¢. It is shared by two faces in t. Each such face contains at least one more edge 
that is also as long as maxEdge(rt) /2. 

Fortunately, every corner in ¢t belongs to at least one of these faces. Thus, from 
every corner in f, there issues at least one edge that is as long as maxEdge(f) /2. 

Now, in the maximal edge, look at that endpoint from which two long edges issue: 
the maximal edge itself, and another edge that is also as long as maxEdge(2)/2. Look 
at the proportional sine at this corner. This completes the proof. 

This equivalence is indeed genuine and adequate. The volume ratio is thus not 
really a new estimate: it gives us the same information as does the maximal propor- 
tional sine. Indeed, if t is thick, then both tell us this. If, on the other hand, t is too 
thin, then both tell us this in the same way. 

Below, on the other hand, we’ll see that this is not always the case: two different 
regularity estimates may seem equivalent, but are not. 


11.2.2. Inadequate Equivalence 


Unfortunately, the “equivalence” introduced in [7] is not good enough. It uses an 
inequality like above, but only for a thick tetrahedron, whose regularity estimate 
is bounded from below by a positive constant, not for a thin tetrahedron, whose 
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regularity estimate approaches zero. This kind of “equivalence” may be rather mis- 
leading and inadequate. 

In Sect. 11.1.2, we have already seen an example of an inadequate equivalence: 
as C4 — 0, the minimal angle is bounded from below much better than the minimal 
sine. 

This is also the case with the ball ratio [the radius of the ball inscribed in t, divided 
by maxEdge(t)]. The ball ratio and the volume ratio may seem equivalent to each 
other, but are not. After all, in the proof in [7], while the ball ratio is well bounded 
from below by C2 > 0, the volume ratio may get as small as G KO <1. 

All this is still too theoretical. To establish that an “equivalence” is inadequate, it is 
not enough to study its proof: there is a need to design a concrete counterexample of 
a limit process in which the tetrahedron ¢ gets less and less regular, yet its regularity 
estimates disagree with each other. This is done next. 

Consider, for example, the flat tetrahedron vertexed at 


(—1,0, 0), (1, 0, 0), (0, —1, €), and (0, 1, €), 


where € > 0 is a small parameter (Fig. 11.3). Is this tetrahedron regular? Well, for 
€ < 1, it certainly isn’t: its volume is as small as ¢, but all its edges are as long as 1. 
Therefore, all regularity estimates agree with each other: they are as smallase < I. 
So, by now, we have no evidence of any inadequacy. The above is no counterex- 
ample: the regularity estimates still agree with each other. 
Let’s try and design yet another example: a thin tetrahedron, vertexed at 


(—1, 0, 0), (1, 0, 0), (0, —€é, E), and (0, é, €) 


(Fig. 11.4). Is this tetrahedron regular? No, it is most certainly not: its volume is 
now as small as <. Still, its regularity estimates disagree with each other. Indeed, 
it has only one edge as short as ¢, and five edges that are as long as 1. For this 
reason, its weak regularity estimates lie: its volume ratio, minimal sine, and maximal 
proportional sine are as small as €”, which is too harsh. Its strong regularity estimates, 
on the other hand, are more realistic: its minimal angle and ball ratio are only as small 
as €, not E* 

What might happen if a weak regularity estimate was used in a stopping criterion 
in multilevel refinement? Well, we might then believe that a particular tetrahedron is 
too thin, even when it is not. This might be too pedant, leading to stopping too early, 
and rejecting good legitimate fine meshes. 

Picking a smaller stopping threshold is no cure: this might be too loose, leading 
to stopping too late, and accepting flat tetrahedra, for which all regularity tests are 
as good (Fig. 11.3). 

A robust regularity test is thus clearly necessary. To be practical, it must also be 
easy to calculate. This is done next. 
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Fig. 11.3 A flat tetrahedron: 7 
a view from above. All 
regularity estimates (weak 
and strong alike) are as good 


Fig. 11.4 A thin 
tetrahedron: a view from 
above. The weak estimates 
lie: they are as small as <7, 
but the true estimate is only y 
as small as € 


11.2.3 Ball Ratio 


Like the minimal angle, the ball ratio is a robust regularity estimate. This is the radius 
of the ball inscribed in ¢, divided by maxEdge(r). 

In other words, look at the largest ball that can be contained in f. Clearly, this ball 
is tangent to the faces of t from the inside. Denote its center by 0, and its radius by 
r. Now, take r, and divide by the length of the maximal edge in f. 

Unfortunately, this is still a geometrical definition, not easy to calculate. Is there 
an algebraic formula, easy to calculate on the computer? 
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Fortunately, there is. For this purpose, connect o to the four corners of t: k, 1, m, 
and n. This splits ¢ into four disjoint subtetrahedra. 

Clearly, the volume of t¢ is the sum of the volumes of these subtetrahedra. Fur- 
thermore, in each subtetrahedron, the radius issuing from o makes a right angle with 
the face that lies across from o. Thus, 


| det(S,)| = |det(S(o1,m,n))| + |det(Sac.o,m,n))| + |det(Sact,0,n))| + |det(Sacim,o))| 
=r ((@m—1) x @—D|| + ||(m—k) x (X—k)|| 
+ ||(d—k) x (9—k)|| + |@—k) x (m—k)|/), 


where “x” stands for vector product. 
This formula can now be used to calculate r. The ball ratio is then obtained 
immediately as r/maxEdge(t). 


11.3. Numerical Experiment 


11.3.1 Mesh Regularity 


What is the regularity of the entire mesh M? Naturally, it is just the minimum reg- 
ularity of the individual tetrahedra in M. Still, not all tetrahedra should be con- 
sidered. After all, a tetrahedron with 3-4 corners outside {2 should be disregarded 
(Sect. 10.4.1): 


minimal sine(M) = minSine(f), 


min ; 
tcM, t has 2-4 corners in (2 


or 


minimal maxPsine(M) = maxPsine(f), 


min ; 
tcM, t has 2-4 corners in 2 


or 


minimal balIR(M) = n ballR(f). 


mi 
tcM, t has 2-4 corners in £2 


11.3.2. Numerical Results 


To test the quality of our multilevel refinement, we consider the nonconvex domain 
in Chap. 10, Sect. 10.2.2, with Rz = 1 and R; = 0.75. The initial (coarse) mesh is 
just a hexahedron: three edge-sharing tetrahedra, each two sharing a face as well. 
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In our first (dummy) test, only inner splitting is used: no boundary refinement 
is carried out at all. This way, all meshes at all levels remain confined to the initial 
hexahedron. As a result, §2 remains poorly approximated. 

Why is this test important? Well, it may help filter out the effect of inner splitting 
only: may this affect regularity? 

Fortunately, not much. Indeed, from Table 11.1, it turns out that, in 11 levels, 
regularity decreases by 50% only. Furthermore, different regularity estimates have 
nearly the same value: minimal sine is as large as ball ratio. This tells us that no 
tetrahedron is probably as thin as in Fig. 11.4. 

Since no boundary refinement is used, no new valley is produced. Therefore, 
before each refinement step, the tetrahedra are ordered in terms of maximal edge 
only, as in Chap. 10, Sect. 10.1.4. After all, no valley is filled, so no deep-valley 
criterion is relevant. This approach is used in the next test as well. 

In the second test, we move on to a more interesting implementation: use boundary 
refinement as well, to help approximate {2 better. Still, we don’t use the trick in 
Chap. 10, Sect. 10.3.3, as yet: no valley is filled as yet. After all, 0 is rather smooth, 
so the mesh in Fig. 10.6 is only slightly concave, producing no overlapping tetrahedra 
in the next refinement steps. 

To estimate the accuracy of the mesh, we also report the volume error: 


[feo EL ffs 


tcM, t has 2-4 corners in 2 


This is discussed in detail in Chap. 12 below. 

From Tables 11.2—11.3, it turns out that it is indeed a good idea to order the 
tetrahedra by maximal edge before each refinement step. Although this may require 
more nodes, this is a price worth paying for the sake of better accuracy and regularity. 
Indeed, regularity decreases as slowly as linearly, whereas accuracy improves as fast 
as exponentially. 

In the third and final test, on the other hand, we let the deep-valley criterion 
dominate the maximal-edge criterion. Before each refinement step, the tetrahedra 
are now ordered as in Chap. 10, Sect. 10.3.3. Then, the tetrahedra split in this order 
as well, at the edge of deeper valley, if any. This may help fill the valleys in Fig. 10.7. 

Unfortunately, in the first three levels, regularity drops (Table 11.4). After all, the 
original hexahedron is slightly concave from below, so a submaximal edge may split 
there, leaving a maximal edge coarse and long. Fortunately, in the next higher levels, 
things get better: the regularity remains nearly constant, with a very good accuracy 
in a moderate number of nodes. Thus, in practice, it may make sense to ignore those 
old valleys that already exist in the initial mesh. 
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Table 11.1 The dummy test: no boundary refinement is carried out, so all meshes are confined to 
the original hexahedron, with inner refinement only. Three regularity estimates are reported at the 
11th level 


Ordering strategy Minimal ballR Minimal sine Minimal maxPsine 
Leave disordered 0.012 0.007 0.023 
Order by maximal 0.033 0.056 0.129 


edge 


Table 11.2 The nonconvex domain: R2 = 1, Rj = 0.75. The tetrahedra are left disordered. No 


valley is filled 
Level Nodes Tetrahedra | Minimal Minimal Minimal Volume 
ballR sine maxPsine error 
1 5 3 0.0548 0.1301 0.24839 0.2305 
2 8 6 0.0758 0.1330 0.24569 0.2305 
3 14 24 0.0465 0.1045 0.18739 0.1942 
4 32 84 0.0307 0.0371 0.06409 0.1358 
5 74 276 0.0164 0.0328 0.07749 0.0579 
6 173 720 0.0119 0.0328 0.04239 0.0234 
vi 447 1938 0.0088 0.0146 0.02999 0.0153 
8 1080 5187 0.0090 0.0091 0.01649 0.0084 
9 2770 13740 0.0060 0.0071 0.008 1 0.0037 
10 7058 36651 0.0035 0.0036 0.0093 0.0017 
11 18116 92792 0.0018 0.0021 0.0059 0.0003 


Table 11.3. Before each refinement step, order the tetrahedra by maximal edge: those with a longer 
edge before others. No valley is filled 


Level Nodes Tetrahedra | Minimal Minimal Minimal Volume 
ballR sine maxPsine error 
1 5 3 0.0547 0.1301 0.2483 0.2305 
2 8 6 0.0758 0.1330 0.2456 0.2305 
3 14 24 0.0465 0.1045 0.1873 0.1942 
4 32 84 0.0307 0.0371 0.0640 0.1358 
5 83 282 0.0170 0.0371 0.0525 0.0588 
6 212 858 0.0233 0.0269 0.0524 0.0279 
7 560 2472 0.0103 0.0173 0.0299 0.0145 
8 1530 7386 0.0102 0.0148 0.0285 0.0073 
9 4297 21516 0.0048 0.0077 0.0134 0.0027 
10 11897 61446 0.0048 0.0058 0.0140 0.0006 
11 32976 168602 0.0048 0.0062 0.0135 0.00001 
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Table 11.4 The deep-valley criterion dominates the maximal-edge criterion. This way, an edge 


along a valley splits early, even though it is submaximal 


Level Nodes Tetrahedra | Minimal Minimal Minimal Volume 
ballR sine maxPsine error 
1 5 3 0.0547 0.1301 0.2483 0.2305 
2 9 16 0.0094 0.0055 0.0130 0.2199 
3 19 50 0.0045 0.0026 0.0039 0.2100 
4 48 176 0.0045 0.0026 0.0039 0.1522 
5 128 523 0.0049 0.0020 0.0053 0.0726 
6 324 1479 0.0048 0.0020 0.0053 0.0252 
7 879 4263 0.0049 0.0020 0.0053 0.0132 
8 2484 12674 0.0040 0.0020 0.0053 0.0030 
9 7197 37603 0.0035 0.0013 0.0053 0.00013 


11.4 Exercises 


1. Show that, in terms of maximal proportional sine, the unit tetrahedron T has the 


maximal possible regularity: 1. 


2. Write the even equilateral tetrahedron explicitly, and calculate its minimal sine 
and its maximal proportional sine. 
3. Show that, in terms of minimal sine, the even equilateral tetrahedron has the 
maximal possible regularity. 
4. Why are both minimal sine and maximal proportional sine not as robust as minimal 
angle or ball ratio? Hint: see Fig. 11.4. 
5. Write (and prove) an explicit formula to calculate the ball ratio. Hint: see 
Sect. 11.2.3. 


Chapter 12 ®) 
Numerical Integration cree 


Does our multilevel refinement work well? Does it approximate well the original 
domain? To check on this, we use numerical integration. 

Fortunately, our numerical results are encouraging: as the mesh refines, the numer- 
ical integral gets more and more accurate. This indicates that our original algorithm 
is indeed robust and could be used in even more complicated domains. 

Of course, there is a price to pay: regularity must decrease. After all, this is 
why tetrahedra are so suitable to serve as finite elements in our mesh: they are 
flexible, and may come in all sorts of shapes and sizes. This is indeed the key for 
an efficient local refinement: use small tetrahedra only where absolutely necessary. 
Still, to approximate the curved boundary well, some tetrahedra must also be a little 
thin. Fortunately, in our numerical experiments, regularity decreases only moderately 
and linearly from level to level. This is good enough: after all, it is unavoidable and 
indeed worthwhile to compromise some regularity for the sake of high accuracy. 


12.1 Integration in 3-D 


12.1.1 Volume of a Tetrahedron 


Consider again a tetrahedron of the form 
t = (k,1,m,n), 


vertexed at k, 1, m, and n (Fig. 9.1). In Chap. 9, Sect. 9.1.2, we have already seen how 
to integrate in f, using an easier calculation in the unit tetrahedron T: 


© Springer Nature Switzerland AG 2019 359 
Y. Shapira, Linear Algebra and Group Theory for Physicists 
and Engineers, https://doi.org/10.1007/978-3-030- 17856-7_12 


360 12 Numerical Integration 


Jf [re y, z)dxdydz = jaerisai f ff (F o E,) (x, y, z)dxdydz, 
t T 


where E; maps T onto ¢, and S;, is its Jacobian. In this chapter, we explain this 
formula in some more detail and in a wider context. 

The standard coordinates x, y, and z used in ¢ could also be written in terms of 
reference coordinates: x, y, and Z in T (Fig.8.5). These new coordinates could be 
defined implicitly in T by 


=k+5S, 


AD > &> 
ND > &> 


x 
y]J=8&, 
ra 


where E, and S; are as in Chap.9, Sect.9.1.1. This way, every point (x, y, z) € fis 
given uniquely in terms of its own reference point (X, y, Z) € T. 
The reference coordinates can now be used to integrate in f. In particular, the 


volume of t is 
det (S22) | Atavas 


[[ feroa=f ff AG, 9.5) 
=f ff tects asasas 
T 
= wee (sini f ff azasaz 
T 


__ [det (S,)| 
ou 


O(x, y, 2) 


(See Chap. 8, Sect. 8.9.5, and verify that the volume of T is indeed 1/6.) This result 
is particularly useful in the numerical integration below. 


12.1.2. Integral in 3-D 


We’ve already seen integration in two spatial dimensions (Chap. 8, Sect. 8.7.3). Let’s 
extend this to three spatial dimensions as well. For this purpose, we can use the mesh 
designed in Chap. 10. All that is left to do is to let the mesh size approach zero: 


meshsize(M) = maxEdge(t) — 0. 


max ; 
tcM, t has 2-4 corners in 92 


Consider a domain 2 C R?, and assume that a function f = f(x, y, z) is defined 
in it. Assume also that a multilevel hierarchy of meshes is available to approximate 
92 better and better. Let M be some mesh in this hierarchy. 
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The following limit, if indeed exists, defines the integral of f in 92: 


ee) f(x, y, zdxdydz = lim 
2 meshsize(m) +0 


Idet (S| f(k) + f@ + fam) + fa@) 
De 6 4 


t=(k,.m,n)cM, t has 2—4 corners in 2 


12.1.3 Singularity 


But what if f had a singularity? Well, if f is not well-defined at some corner (say k) 
in some tetrahedron f, then, in the contribution from t to the above sum, substitute 


fQ) + fam) + fa) 
5 ; 


f(k) < 


If the singularity is not too sharp, then this might help. For example, assume that, at 
a distance r from k, f is as small as 


| f| < constant a? 


Assume also that the mesh is fairly regular, so ¢ is rather thick. In this case, f is as 
small as 


max (| f()|,|fG@n)|,|f@)|) < constant - maxEdge~*(t). 


Thus, the above substitution indeed helps. After all, the contribution from t¢ is also 
multiplied by the volume of t, which dominates: it is as small as 
det (S 
eee < maxEdge? (t). 


Let’s see some interesting examples. For this purpose, let’s introduce spherical coor- 
dinates in three spatial dimensions. 
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12.2 Changing Variables 


12.2.1 Spherical Coordinates 


We’ve already used spherical coordinates implicitly to construct the unit sphere in 
the first place (Chap.6, Sect.6.1.4). Here, however, we introduce them fully and 
explicitly, and use them more widely. 

A nonzero vector (x, y, z) € R° could be written in terms of its unique spherical 
coordinates: 


e r > 0: the magnitude of the vector, 

e —7/2 < @ < 7/2: the angle between the original vector and its orthogonal 
projection onto the x-y plane, 

e and 0 < @ < 2m: the angle between this projection and the positive part of the 
X-axis. 


Our new independent variables are now r, @, and ¢. 6 is known as the azimuthal 
angle: it is confined to the horizontal x-y plane. ¢, on the other hand, measures the 
elevation from the x-y plane upwards. Its complementary angle, 7/2 — @, is known 
as the polar angle between the original vector and the positive part of the z-axis 
(Fig. 12.1). 

The original Cartesian coordinates x, y, and z can now be viewed as dependent 
variables. After all, they now depend on our new independent variables r, 0, and ¢: 


Fig. 12.1 The vector 

(x, y, Z) in its spherical 
coordinates: r (the 
magnitude of the vector), 
(the angle between the 
original vector and its 
projection onto the x-y 
plane), and 6 (the angle 
between this projection and 
the x-axis) 
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x =r-cos(¢) cos(@) 
y =r-cos(@) sin(@) 
Z=r-sin(@). 


Let’s use the new spherical coordinates in integration. 


12.2.2. Partial Derivatives 


Since x, y, and z are functions of r, 6, and ¢, they also have partial derivatives with 
respect to them (Chap. 8, Sects. 8.9.1-8.9.4). For example, the partial derivative of 
x with respect to 0, denoted by 0x/00, is obtained by keeping r and ¢ fixed, and 
differentiating x as a function of 6 only. These partial derivatives form the Jacobian. 


12.2.3 The Jacobian 


As discussed in Chap. 8, Sect. 8.9.4, the Jacobian is the matrix of partial derivatives: 


O(x, y, 2) 


Orr, 0, d) 
Ox Ox Ox 


cos(¢) cos(9) —r-cos(¢) sin(@) —r - sin(@) cos(@) 
= | cos(@)sin(@)_ r-cos(@)cos(@) —r - sin(d) sin(6) 
sin(@) 0 r - cos(¢) 


12.2.4 Determinant of Jacobian 


As in Chap. 2, Sect.2.1.1, the determinant of the above Jacobian is 


sa (28-22) 
O(r, 8, o) 
cos(¢) cos(0) —r-cos(@) sin(@) —r - sin(d) cos(A@) 


= det cos(¢) sin(9) r-cos(@)cos(@) —r - sin(@) sin(@) 
sin(¢?) 0 r-cos(@) 
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cos(¢) cos(@) —sin(@) — sin(@) cos(A@) 
= r’ cos(¢) det cos(¢) sin(@) cos(@) —sin(@) sin(@) 
sin(?) 0 cos(¢) 
=r cos(?) (sin? (¢) (sin? (0) + cos’(8)) + cos” (¢) (cos” (0) + sin’ (9))) 
= r’ cos(¢) (sin? (4) + cos”()) 
=r’ cos(¢). 


Thanks to this determinant, we can now go ahead and integrate in spherical coordi- 
nates. 
12.2.5 Integrating a Composite Function 


Let’s write f as a composite function of the spherical coordinates r, 0, and ¢: 


f(x,y, 2) = fA, 9, 9), Wr, 8, 9), 20, 8, 6) = FU, 8, O). 


This way, we can now go ahead and integrate in spherical (rather than Cartesian) 


coordinates: 
[ff [ tes aaxavae 
O(x, y, Z) 
=f f [ree ote (Seas @. Sy) | @raoas 
= / / i: f(r, 0, dr? cos(¢)drdéd¢. 
Q 


For this purpose, however, S2 must be written in spherical coordinates as well. In 
some symmetric cases, this is easy to do. 


12.3 Integration in the Meshes 


12.3.1 Integrating in a Ball 


Assume that 2 is a ball of radius R > 0, centered at the origin: 


= {(x,y,z) | x+y? +27 < R*} 
= {(7,6.0) | O<r<R, 0<0<2z, = 2O2— 
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Note that, in spherical coordinates, the vertical line x = y = 0 (or 6 = +77/2) must 
be excluded from £2, because @ is not defined uniquely there, and the Jacobian is 
singular there. Fortunately, this line has no effect on the integral: it has zero volume, 
or zero three-dimensional measure. 

Assume also that the integrand f depends on r only: 


fe fe. 


In this case, the integral of f in £2 is 


i i: i fdxdydz = / / / f(r)r? cos(¢)drdbd¢ 
: Qn - n/2 R 
= / do i cos(d)d¢ / f(r)r°dr 
0 —n/2 0 


R 
=2n (sin (5) - sin (+)) i fyradr 
R 
= 4x | f (r)r?dr. 
0 


Here are some straightforward examples. If f(r) = 1/r, then we have 


dxdyd 7 
Lf | YS -« r-dr =2rR? 
aVxrrt+y+?2 0 


(Chap. 8, Sect. 8.5.3). If, on the other hand, f = 1 is just the constant integrand, then 
we obtain the volume of the ball of radius R: 


- 2 Ar os 
dxdydz =4n r‘dr = —R-. 
Q 0 3 


12.3.2 Stopping Criterion 


The above formula may help calculate the volume error in Chap. 11, Sect. 11.3.2. 
This way, our multilevel refinement proves robust: the volume error indeed decreases 
rapidly as the mesh refines. 

In more general cases, on the other hand, {2 is not as simple, and its volume is 
not available in a closed analytic form. How then could we make sure that our mul- 
tilevel refinement works well? Well, on each mesh, calculate the numerical integral. 
Then, calculate the difference between each two consecutive numerical integrals: the 
numerical integral on this level, minus the numerical integral on the previous level. 
Does this difference approach zero (in absolute value) as the mesh refines? If it does, 
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then we must be on the right track. If, on some level, it fails to decrease, then some 
tetrahedra might overlap, so we should better stop refining. 


12.3.3 Richardson Extrapolation 


To have a yet better numerical integral, one might also want to use Richardson 
extrapolation: take the numerical integral on this level, and multiply it by 4/3. Then, 
take the numerical integral on the previous level, and multiply it by 1/3. Then, subtract 
the latter from the former. The result may improve on the standard numerical integral: 
it is often twice as accurate. Furthermore, like the standard numerical integral in 
Sect. 12.3.2, it could be used in a stopping criterion as well. 


12.4 Exercises 


1. As in Sect. 12.3.1, use spherical coordinates to compute the integral 


Vie dxdydz 
a fePtryte 


where S2 is the ball of radius R2 (centered at the origin), minus a smaller ball of 
radius Rj. 

2. Let Rj — O What is the limit? 

3. Use the above exercises to define and compute the integral 


aa dxdydz 
2 fxt+ yy? +22 


where 2 is the ball of radius R2, centered at the origin. 
4. Use spherical coordinates to compute the integral 


/ / [ dxd ee 

wo yr+ 22 y? + 2? 
where S2 is the ball of radius R» (centered at the origin), minus a smaller ball of 
radius Rj. 


5. Let Rj — 0. What is the limit? 
6. Use the above exercises to define and compute the integral 


ie dxdydz 
x2 fy? $ 22’ 
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where 2 is the ball of radius R2, centered at the origin. 

7. Explain why the numerical integral in Sect. 12.1.2 sums contributions only from 
those tetrahedra with at least two corners in £2. 

8. Explain why each such tetrahedron contributes the average of f over its four 
corners, times its volume. 

9. Explain why, at a corner of singularity, the missing value of f should be replaced 
by the average of f over the three other corners (in Sect. 12.1.3). 


Chapter 13 ®) 
Spline: Variational Model in Three speek 
Spatial Dimensions 


Once our mesh is sufficiently regular and accurate, basis functions (B-splines) can 
be defined in it. What is a basis function? It has the following properties: 


piecewise polynomial, 

continuous throughout the entire mesh, 

has only one nonzero degree of freedom in the mesh, 

its gradient is continuous across all edges and at all side midpoints, 
its Hessian is continuous at all mesh nodes. 


Why are they called basis functions? Because they can combine to form a much more 
general function. In this sense, they span a wide space of functions. 
Indeed, consider a general function, with the following properties only: 


@ piecewise polynomial, 
e its gradient is continuous at all mesh nodes and all edge- and side midpoints, 
e its Hessian is continuous at all mesh nodes. 


Why is this a more general function? Because it must meet fewer requirements than 
before. Still, it can be written as a unique linear combination of the basis functions. 
This is why they are called basis functions: they indeed form a new basis for this 
function space. 

What is a spline? It is a function of the above type, splined (nailed) at the mesh 
nodes: at each individual node, it must have some prescribed value. This is called a 
spline. 

In theory, there are many possible splines. After all, the total number of degrees 
of freedom is very big: much bigger than the total number of nodes. Still, we are 
looking for one special spline: the minimum-energy spline. 

Let’s formulate the problem in a slightly different way. The mesh nodes form a 
discrete (nonuniform) grid. The original values in the grid are called Dirichlet data. 
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Together, they form a discrete grid function, defined at the nodes only. The challenge 
is to extend the original grid function, and define it not only at the nodes but also in 
between. This extension should have as little “energy” as possible. 

Fortunately, this problem can be formulated algebraically as a system of linear 
equations. The solution to this system gives us the long vector that contains the 
unknown degrees of freedom in the entire mesh, as required. 


13.1 Expansion in Basis Functions 


13.1.1 Degrees of Freedom 


Thanks to multilevel refinement (Chaps. 10, 11), we now have a fairly regular mesh 
M, which approximates the original three-dimensional domain well. Let N be the 
set of nodes, E the set of edges, and L the set of sides in M. This way, M contains 
|N| nodes, |E| edges, and |L| sides. 

What are the degrees of freedom in M? Well, each node has ten degrees of freedom 
(Chap.9, Sects.9.5.3-9.5.5), making a total of 10|N| degrees of freedom in M. 
Furthermore, each edge midpoint has two degrees of freedom (Chap. 9, Sect. 9.5.2), 
making 2|E| more degrees of freedom in M. Finally, each side midpoint has one 
degree of freedom (Chap. 9, Sect. 9.5.1), making |Z| more degrees of freedom in M. 
Thus, in total, M has 

K = 10|N| + 2|E| + |L| 


degrees of freedom. 

For each of these degrees of freedom, we have one basis function: a continuous 
piecewise-polynomial function, with only one nonzero degree of freedom in M. Let’s 
go ahead and use them. 


13.1.2. The Function Space and Its Basis 


Why are these functions called basis functions? Because they form a basis! Indeed, 
they make a unique linear combination to help write every given function. 

More precisely, not quite every function, but only a function f with the following 
properties: 


1. f is piecewise polynomial: in each individual tetrahedront C M, f isa (different) 
polynomial of degree five. 
2. In each individual node in N, f has continuous gradient and Hessian. 


13.1 Expansion in Basis Functions 371 


3. In every edge- and side-midpoint in M, f has continuous nontangential partial 
derivatives. 


As we’ve seen in Chap. 9, Sect.9.4.2, f must then be rather smooth: it must have a 
continuous gradient throughout the entire edge (for all edges in EF). Furthermore, f 
must also be continuous throughout M (Chap. 9, Sect. 9.4.3). 

Thus, f can be written as a unique linear combination of the basis functions: 


f= s- CV + cdi. + +++ + G10U%4.10 
jeN 
+ x Chih. + Ch.2Vh.2 


h=(j+q)/2, G.Me£ 


So ~ Cww- 


w=(j+qtu)/3, A(jj,q.weLl 


What are the coefficients in this expansion? Well, they are just the corresponding 
degrees of freedom of f. For example, cy is the nontangential derivative of f at the 
side midpoint w, say 


Cw = f,(w) 


(Chap. 9, Sects. 9.1.4 and 9.5.2). 

Why is the above expansion true? Well, consider some tetrahedron t C M. Now, 
in t, both sides of the above expansion are polynomials of degree five with the 
same 56 degrees of freedom. Therefore, they must coincide throughout t (Chap. 9, 
Sect.9.4.1). This is true for every t C M. Thus, they must also coincide throughout 
M, as required. 

This means that the basis functions indeed form a basis. After all, every function 
Ff with the above properties can be written as a unique linear combination of them. 
In other words, the basis functions span a new function space, containing all these 
f’s. 

To represent f yet more easily, let’s reindex the basis functions in a simpler way, 
and list them one by one in a row: 


Wi, 2, W3, -.-, Vr. 


This way, f could be written simply as 


K 
f= Diciy. 
j=l 


This form will be the most useful below. 
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13.2. The Stiffness Matrix 


13.2.1 Assemble the Stiffness Matrix 


To design the optimal spline, we need a new K x K matrix. This is the coefficient 
matrix (or the stiffness matrix), denoted by A. Let’s specify its elements: 


Ae Vip; Vuidxdydz 
M 


ye i] / / Vib; Vuidxdydz 


tcM 


= Drees f f fv (bj 0 E,) S'S;'V (ah; 0 E,) dxdydz 
T 


tcM 


di. j 


(1 <i, j < K, Chap.9, Sect. 9.1.3). This way, to calculate A, we can now integrate 
in T rather than f. 

How to calculate A efficiently? Fortunately, this could be done iteratively. Initially, 
Aisjust the zero K x K matrix. Then, scan the tetrahedra one by one. Each tetrahedron 
t C M may then add a new contribution to each matrix element: 


aij <_ Gi, j + |det (8,)| i / [ Vv! (~; ° E;) S-g-V (Wj ° E,) dxdydz. 


Clearly, this contribution is often zero. Only if both 7); and y; are nonzero in t may 
the contribution be nonzero. In other words, only if both 7; and ~; have a nonzero 
degree of freedom at a corner or an edge- or side-midpoint in ¢ may the contribution 
be nonzero. 

Alternatively, one could also use the chain rule in Chap. 9, Sect. 9.2.5, to integrate 
in terms of barycentric coordinates. After all, in practice, this integration is in T as 
well. This is particularly relevant if the ~;’s are still available in terms of barycentric 
coordinates in each tetrahedron rf. 

Once all nonzero contributions have been assembled from all tetrahedra, the coef- 
ficient matrix A is ready. Below, we explain why this matrix is indeed relevant to 
design the best spline. 
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13.2.2. How to Order the Basis Functions? 


How to order the basis functions 


Wi, 02, W3, -.-, Wk 


one by one in a row? So far, this wasn’t specified! After all, this didn’t really matter. 
Now, let’s go ahead and order them more explicitly, at least blockwise. More pre- 
cisely, let’s assume that those corner basis functions in Chap. 9, Sect. 9.5.5, are listed 
last. 

More formally, let’s split our original index set into two disjoint subsets: 


{1,2,3,...,K}= RUG, 
where 


R = {1,2,3,...,K —|N]} 
G ={K—|N|+1,K —|N|+2,K —|N|+3,..., K}. 


Assume that G indexes those corner basis functions in Chap. 9, Sect.9.5.5: 


Ve = {wx IN|+1> VK IN|42> VK IN|43s-+-> UK} = {dithien: 


This way, R indexes the rest of the basis functions, contained in wz. This also splits 
the coefficients in Sect. 13.1.2 into two subvectors: 


CR 
© (orn eons e =( 7 


CG 


Assume that f in Sect. 13.1.2 is just a grid function: it is only given at the discrete 
nodes, not in between. This means that only those || degrees of freedom in ¢g are 
available: 


i= fi), jen 


(Chap. 9, Sect.9.5.5). This splines (or nails) f at the nodes only, leaving it still free 
and unspecified in between. To specify f uniquely in between the nodes as well, we 
must also specify the rest of the degrees of freedom in cz. This would indeed define 
the desired spline: the best extension that still agrees with the original grid function 
at the individual nodes. 
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13.3 Finding the Optimal Spline 


13.3.1 Minimum Energy 


What does “best” mean? Well, for this purpose, let’s define the energy of f. In 
physics, this is often called kinetic energy (Chap. 14, Sect. 14.8.3): 


enerey(s) = ff a IV fl3axdydz 


= / / [ V' £V fdxdydz 


K K 
-/{// Vici V0) ci Vbidxdydz 
M j=l i=l 
K K 
=Dyyoaf ff Vw Vuidxdydz 
M 


j=l i=l 


K K 
= ) ) Cj Qi, 7 Ci 


j=l i=l 
c’ Ac. 


To minimize this kind of energy, let’s decompose A cleverly. 


13.3.2. The Schur Complement 


Thanks to the above splitting, A has a new block form: 


Aa ( Are Arc 
Agr Acc } 
In practice, no explicit reordering of rows or columns is needed: the above block 


form may remain implicit. Still, it is particularly useful to minimize the energy. 
Fortunately, A is symmetric. Therefore, 


Angi x, 


Furthermore, Arr is symmetric as well. Moreover, Arp is also positive definite: it 
has positive eigenvalues only (see exercises below). Therefore, Arr is nonsingular: 
it has a unique inverse matrix Ana: Moreover, ADs is symmetric and positive definite 
as well. Thus, A can be written as a triple product: 
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ioe Arr Arc \ _ ( Arr 9 Arr) (Arr Arc 
Acr Aca Agri 0 Ss Oo 7 iS’ 


where J is the identity matrix of order ||, and S is the Schur complement matrix: 
S S AGG = AcrARRARG- 


Thanks to this decomposition, the energy can now be written as 
Arr 0) (Age 0\ (Arr A 
t 228 RR RR RR ORG 
cac=el (Aten) (4s ae i )e 

_ ( { Arr Arc - ; AG, 0 Arr Arc ‘ 

~ 0 Tf 0 S 0 TT 

_ (Arrer + Arcee \' Apr 0) ( Arrer + Arctc 

~ CG 0 S CG 

= (Argrer + Arce)’ Agr (Arrer + Arcta) + ¢G Sec. 


How to minimize this? Clearly, the latter term is not in our hands to minimize. Indeed, 
it is fixed: Cg is given in advance. So, we can only “play” with the former term. How 
to minimize it? Best make it zero! Indeed, since Aan is symmetric and positive 
definite, it would be best to pick 


Arrtr + Arctgc = 9, 


or 
Arretr = —Arceg. 


Since Arp is nonsingular, this system indeed has a unique solution cr, which can be 
found iteratively, as in Chap. 17 in [61]. This produces the entire vector c, including 
the new degrees of freedom in cr, required to extend f to the entire mesh. This is 
indeed the desired spline. 


13.4 Exercises 


. Show that the coefficient (stiffness) matrix A is indeed symmetric. 

. Conclude that the eigenvalues of A are real. Hint: see Chap. 1, Sect. 1.9.4. 

3. Conclude that the eigenvectors of A are orthogonal to each other. Hint: see 
Chap. 1, Sect. 1.9.5. 

4. Let v be a K-dimensional vector. Show that v’Av > 0. Hint: use the very 

definition of A to show that v‘ Av is just energy. 


Noe 


10. 
11. 


12. 
13. 
14. 
15. 
16. 


17. 


18. 
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Conclude that the eigenvalues of A are nonnegative. Hint: pick v above as an 
eigenvector. 

Show that Arr (the upper left block in A) is symmetric as well. 

Conclude that the eigenvalues of Arp are real. 

Conclude also that the eigenvectors of Agr are orthogonal to each other. 

Let v be a (K — |N|)-dimensional vector. Show that v'Arrv > 0. Hint: add 
dummy zero components. In other words, let 0 be the |N|-dimensional zero 
column vector. This way, 


v' Arrv = (v', 0') A (3) > 0. 


Conclude that the eigenvalues of Agr are nonnegative. 
Show that, if v’ Arrv = 0, then v must be the zero vector. Hint: use the compo- 
nents of v as degrees of freedom in a new function 


j= ~ vi; 
JER 


with zero energy. In each tetrahedron t C M, pick acorner q € tf. For each point 
(x, y, Z) € f, use the line integral 


(x, y,2) 


(x,y,2) 
gen. =g@+ f vg-at =0+ | 0-dl 
q q 


II 
S 


Conclude that, if v is a nonzero vector, then v’ Arru > 0. 

Conclude that the eigenvalues of Agr are positive. 

Conclude that Ager is nonsingular: it has a unique inverse matrix Age: 

Show that Ae is symmetric as well. 

Show that App has the same eigenvectors as Arr, with the reciprocal eigenvalue. 


Let v be a nonzero (K — |N|)-dimensional vector. Show that v‘ Ase > 0. Hint: 
write v as v = ArrAgrnd: 
Conclude that, to minimize the energy of the spline, cr should better solve the 
linear system 

ARRCR = —Arcec- 


Part V 
Advanced Applications in Physics and 
Chemistry 


Finally, we combine both linear algebra and group theory in practical applications 
in quantum chemistry and general relativity. First, we use group theory to introduce 
the permutation group and study the determinant of a square (complex) matrix. 
Once the determinant is used in our quantum-mechanical model, we can write the 
expected energy and obtain the Hartree—Fock system: a pseudo-eigenvalue problem. 
Thanks to linear algebra, the (generalized) eigenvectors have a desirable property: 
orthogonality. This is indeed how group theory and linear algebra combine to form 
a complete theory with practical applications. 

We conclude with an interesting application in general relativity: Einstein equa- 
tions. To introduce them, we must use new features in linear algebra: tensors and 
algebraic operations between them. For this purpose, we must introduce a new prin- 
ciple: Einstein’s summation convention. It improves on the standard sums used in 
linear algebra. In fact, it tells us how to raise and lower indices and come up with 
a coherent summation strategy. Thanks to it, the nonlinear system of equations gets 
particularly easy to introduce. 


Chapter 14 ®) 
Quantum Chemistry: Electronic cree 
Structure 


Let’s see how linear algebra and group theory can combine in a practical application 
in quantum chemistry: the electronic structure in an atom or a molecule. Indeed, the 
position of each electron is a random variable: we can never tell it for sure, but only 
at some probability. Likewise, energy and momentum are nondeterministic as well: 
we can never know what they are precisely, but only with some uncertainty. This is 
not because we are ignorant, but because nature is stochastic! 

Fortunately, we can still tell where each electron might be, and how likely this is. 
For this purpose, we need its wave function (orbital). How to find the correct orbital? 
For this purpose, we need to gather the kinetic and potential energy of the electron, 
which come from its electrostatic attraction to the nucleus and repulsion from other 
electrons. Again, this is a random variable: it can never be calculated explicitly, but 
only in terms of expectation. Still, this is good enough: thanks to the power of linear 
algebra, the orbital can be solved for as an eigenvector, with a physical eigenvalue: 
its energy level. 

Each electron may have a different orbital, with a different energy. These orbitals 
are (generalized) eigenvectors of the same Hermitian matrix. Thanks to linear algebra, 
we can make sure that they are indeed orthonormal (with respect to a relevant inner 
product). This is indeed their canonical form. 

Why is this important? Because electrons are often indistinguishable from each 
other. To define their orbitals properly, we must also use group theory. Indeed, thanks 
to the permutation group, the wave function can take the form of a determinant, 
leading to a simple formula for the (expected) energy. Thanks to this model, we 
can now handle indistinguishable electrons of the same spin as well. This leads to 
the Hartree-Fock system [1]. This is indeed how group theory and linear algebra 
combine to form a complete theory, with practical applications. 
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14.1 Wave Function 


14.1.1 Particle and Its Wave Function 


Consider a particle in the three-dimensional Cartesian space. In classical mechanics, 
it has a deterministic position: (x, y, z). In quantum mechanics, on the other hand, 
its position is nondeterministic: a random variable, known at some probability only. 

In Chap. 7, Sect.7.9.3, we’ve introduced the state v: a grid function, defined on a 
uniform m x m x m grid. This tells us the (nondeterministic) position of the particle 
in 3-D. How likely is it to be at (X;,;, X;,;, Xx,4)? The probability for this is stored 
in v: itis given by lu; jl’. 

Now, let’s extend this, and make it not only discrete but also continuous. Instead 
of v, let’s talk about a wave function w(x, y, z), defined on the entire three- 
dimensional Cartesian space. This way, the particle could now be everywhere: not 
only on a discrete grid but also in just any point in 3-D. How likely is it to be at 
(x, y, z) € R*? The probability for this is just |w(x, y, z)|*. 

This makes sense: after all, the position should be a continuous random variable, 
which may take just any value, not necessarily in a discrete grid. 

For this purpose, however, the sums and inner products used in Chap.7 should 
be replaced by integrals. In particular, to make a legitimate probability, w must be 


normalized to satisfy 
iff |u(x, y, z)/°dxdydz = 1, 


where each integral sign integrates over one spatial coordinate, from —oo to oo. This 
can be viewed as an extension of the vector norm defined in Chap. 1, Sect. 1.7.2. In 
this sense, w has norm 1. Later on, we’ll make sure that this normalization condition 
indeed holds. 

So far, we’ve mainly talked about one observable: position. Still, there is yet 
another important observable, which can never be continuous: energy. Indeed, only 
certain energy levels are allowed, and the rest remain nonphysical. This is indeed 
quantum mechanics: energy comes in discrete quantities. 

How to uncover the wave function w? This will be discussed below. For simplicity, 
we use atomic units, in which the particle has mass 1, and Planck constant is | as 
well. 


14.1.2. Two Particles 


Consider now two particles that may interact and even collide with each other. In this 
case, they don’t have independent wave functions, but just one joint wave function: 
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wr, 72), where r; = (%1, ¥1, Z1) and r2 = (%2, yo, Z2) are their possible positions. 
Once w(rj, 72) is solved for, it must also be normalized to satisfy 


PLL Lf free rrPandndeideaydes = 1. 


This way, |w(r), r2)|? may indeed serve as a legitimate probability function, to tell 
us how likely the particles are to be at r; and rz at the same time. 

Unfortunately, there are still a few problems with this model. First, what about 
three or four or more particles? The dimension soon gets too high to handle! Besides, 
even with just two particles, the joint wave function is not very informative: it doesn’t 
give us any information about each individual particle on its own. Therefore, it may 
make sense to assume that the particles don’t interact with each other, so their wave 
function can be factored as a product of the form 


wr, 2) = vO (rv (rz), 


where v")(r,) and v) (rz) are the wave functions of the individual particles. Later 
on, we’ll improve on this model yet more, to handle not only distinguishable but also 
indistinguishable particles. 


14.2 Electrons in Their Orbitals 


14.2.1 Atom: Electrons in Orbitals 


Consider now a special kind of particle: an electron. More precisely, consider an 
atom with M electrons. In particular, look at the nth electron (1 <n < M). It has 
a (nondeterministic) position r, = (Xn, Yn, Zn) in the three-dimensional Cartesian 
space. 

Where is the electron? We’ ll never know for sure! After all, measuring the position 
is not a good idea—it can change the original wave function forever, with no return. 
Without doing this, the best we can tell is that the electron could be at r,. The 
probability for this is |v (r,,)|?, where v is the wave function of the nth electron: 
its orbital. 

In general, v is a complex function, defined in the entire three-dimensional 
Cartesian space. Like every complex number, v) has a polar decomposition. In 
it, what matters is the absolute value. The phase, on the other hand, has no effect 
on the probability |v |?. Still, it does play an important role in the dynamics of 
the system: it tells us the (linear and angular) momentum of the electron, at least 
nondeterministically. 

The function v“(r,) is also known as the nth orbital: it tells us where the nth 
electron could be found in the atom. Unfortunately, v is not yet known. To uncover 
it, we must solve a (generalized) eigenvalue problem. 
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14.2.2. Potential Energy and Its Expectation 


What is the potential energy in the atom? This is a random variable, so we can never 
tell it for sure. Fortunately, we can still tell its expectation. For this purpose, assume 
that 


w(r1,12,.--.Tu) 
is the joint wave function of all electrons together. This way, |w(r1,72,..., ru)\" is 
the probability to find them at), ro, ..., 7, at the same time. Later on, we’ll write w 


more explicitly. 

The potential energy has a few terms, coming from electrostatics. The first term 
comes from attraction to the nucleus (assumed to lie at the origin). To have its 
expectation, take the probability |w|* to find the electrons at a certain position, 
multiply by the potential 1/||r||, sum, and integrate: 


-{ / [- | / / Yt dnd derdrdoaee .-dxydymdzm. 


This is a 3M-dimensional integral: each integral sign integrates over one individual 
spatial coordinate, from —oo to oo. Later on, we’ll simplify it considerably. 

On top of this, there is yet more potential energy. This adds more terms, coming 
from the electrostatic repulsion of every two electrons from each other: 


Lie auuby 3 _ =o ep andy desdeadyadea -dxydyydzy. 


i=1 n=i+1 


These sums scan all pairs of indices 1 < i,n < M. Still, only ifi < n does the pair 
appear. After all, a pair must appear just once, not twice. 
These are the Coulomb integrals. Let’s go ahead and simplify them. 


14.3 Distinguishable Electrons 


14.3.1 Hartree Product 


Unfortunately, w is not informative enough: it mixes different orbitals with each 
other. We might want to separate variables, and write w as a product of orbitals: 


Wr, P25 +++ tM) = vO (ry )v (72) + v™ (ry). 
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This is the Hartree product. Thanks to it, we’ll have more information about the nth 
individual electron, and how likely it is to be at r,,. In fact, the probability for this is 
just |v (7,12. 

Later on, we’ll see that this kind of factorization is possible only for distinguishable 
electrons. Indistinguishable electrons, on the other hand, must have a more compli- 
cated wave function. Still, for the time being, let’s assume that this factorization is 
valid. 


14.3.2 Potential Energy of Hartree Product 
Assume now that each individual orbital indeed makes a legitimate probability 


function: 
Jf fw oParayas =, l<i<M. 


In this case, the expectation of the potential energy of the Hartree product simplifies 
to read 


M ju”) |? 
a5) } dxdydz 
= Ir 
M M 1 
(i) es (n) 7x) 12 ~ ym ye 
+e Le Lf fe (r)| ron” (7) |->dxdydzdxdSdz. 


i=1 n=i+1 


Here, both r = (x, y, z) andr = (x, y, Z) are dummy variables, integrated upon in 
the latter six-dimensional integral. 


14.4 Indistinguishable Electrons 


14.4.1 Indistinguishable Electrons 


Unfortunately, two electrons can be distinguished from each other only if they have 
a different spin: one has spin-up, and the other has spin-down. (See exercises at the 
end of Chap. 7.) If, on the other hand, they have the same spin, then they can never 
be distinguished from each other. 

Thus, it would make sense to place our electrons in two disjoint subsets. For this 
purpose, let 0 < L < M be a new integer number. Now, assume that the L former 
electrons have spin-up, and the M — L latter electrons have spin-down. 
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Let’s focus on the L former electrons. What is their joint wave function? Well, it 
can no longer be a simple Hartree product. After all, their indistinguishability must 
be reflected from their wave function. 


14.4.2 Pauli’s Exclusion Principle: Slater Determinant 


Thus, the L former electrons must have a more complicated wave function—a deter- 
minant of anew L x L matrix: 


Jaen) 


This way, they also satisfy Pauli’s exclusion principle: two electrons can never have 
the same state (same spin and also same orbital). Indeed, in this case, the above 
matrix would have two identical columns, so its determinant would vanish. This is 
called Slater determinant. To study it, we must redefine the determinant from scratch, 
and study its algebraic properties. For this purpose, group theory comes handy. 


14.5 The Permutation Group 


14.5.1 Permutation 


Consider the set of m natural numbers: 
{1, 2, 3, ...,n}. 


A permutation is a mapping from this set onto itself. This means that the permutation 
maps each natural number 1 < i < n to a distinct natural number | < p(i) <n. 
This is denoted by 

Pdi, 2, 3, ...,n}). 


14.5.2 Switch 


For example, the switch 
(1 > 3) 


switches | with 3: at the same time, | maps to 3, 3 maps to 1, and the rest of the 
numbers remain fixed: 2 maps to 2, 4 maps to 4, and so on. For this reason, the switch 
is symmetric: it can also be written as 
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d>-3)=@G>- 1). 
After the switch, the list of n natural numbers takes a new order: 
(3,2, Wy 4y 3ye5.c3n}: 
We say that the switch is odd: it picks a minus sign. This is denoted by 


e(3 > 1) =-1. 


14.5.3 Cycle 


A cycle, on the other hand, can be more complicated. For example, 
d—-3- 2) 


maps | to 3, 3 to 2, and 2 to | (at the same time). This is why the cycle is indeed 
cyclic: it can also be written as 


do>372)=27>1->3)=6>2-> 1). 

In all these forms, the new order is 

(2; :35-1,.45. 5; cogent: 
What does this cycle do? It maps 1-3. To make room, both 3 and 2 must shift one 
space leftwards. As a matter of fact, this can be written as the composition (or product) 
of two switches: 

B7>2-)=6->2)2- 1). 

This composition is carried out right to left: 1 switches with 2, producing 

(2; 1; 33 4.5; focgn}: 
Then, it switches with 3 as well, producing 


{2, 3, 1, 4, 5, ...,n}, 


as required. 
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Why is this decomposition useful? Because it tells us that the cycle is even, not 
odd. Indeed, each switch picks a minus sign, which cancel each other: 


e(372> I)=e(B> ADe(Q-Y=C-VY)A-)D= 1. 
Let’s introduce a short notation for this cycle: 


[37> 1l=6 7-2-1). 


As discussed above, this cycle is even: 
e((3 > 1)/) =1. 


This is also called a 3-cycle. The switch, on the other hand, is also called a 2-cycle. 
Likewise, we can also write a yet longer cycle—a 4-cycle: 


44> 1 =4-3->2- 1). 


This cycle can be decomposed as the composition of three switches: 


[44> I] = 4-33 > 2)2 > 1). 
Again, this is read right to left: 1 switches with 2, then with 3, then with 4. The 
result is 
{2, 3, 4, 1, 5, 6, ..., n}, 
as required. This is why this cycle is odd, not even: 


e([4— 1) =—1, 


and so on. 


14.5.4 Permutation Group 


How does a general permutation look like? Well, suppose that 1 maps to some 
1 < k <n. This occupies k: the rest of the numbers (from 2 to 7) can no longer 
map to k. To meet this condition, they should be mapped in two stages: first, mix 
them (using a smaller permutation). Then, shift those numbers that lie from 2 to k 
one space leftwards. This way, k is not used, as required. In summary, the original 
permutation has been decomposed as 


pil, 2, 3, ...,n}) =[k > lq ({2, 3, 4, ...,n}), 


for a unique (smaller) permutation g that mirrors p. 
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Let’s place all permutations on {1, 2,3, ...,} in a new group: 
P({l, 2, 3, ...,n}). 


Why is this a legitimate group? Well, it has an algebraic operation: composition of 
mappings. This way, it is indeed associative. Furthermore, it contains the identity 
permutation that changes nothing. Once the identity permutation is composed with 
any other permutation, it leaves it as is. Finally, every permutation has a unique 
inverse permutation that undoes it. 

Thanks to the above, the entire permutation group can be written as the union of 
smaller groups: 


P({1, 2, 3, ...,n}) =U"_ [fk > LP ((2, 3, 4, ...,n)). 


Note that, in this union, the k-cycles [k — 1] have alternating signs: even, odd, even, 
odd, and so on. In other words, 


e([k > 1) =(-D*". 


This will help redefine the determinant. 


14.5.5 Number of Permutations 


To use the above group more easily, let’s denote it by 

P= P({1,2,3,...,n}) 
for short. How big is P? In other words, how many permutations are there in P? 
Well, let’s count. 1 can map to n possible numbers. On top of these, 2 can map to 
n — 1 possible numbers. On top of these, 3 can map to n — 2 possible numbers, and 
so on. In total, there are n! different permutations in P: 

|P| =n!. 
Half of them are odd, and half are even. To see this, pick some odd permutation 
q € P, say 
q=(U- 2). 

This way, for every permutation p € P, 


e(qp) = e(q)e(p) = —e(p). 


Therefore, the invertible mapping 


388 14 Quantum Chemistry: Electronic Structure 


Pp 4p 


maps every odd permutation to an even one, and every even permutation to an odd 
one. 


14.6 Determinant 


14.6.1 Determinant: A New Definition 


How to use the permutation group in practice? Consider an n x n (complex) matrix: 


A= (Gi, 1) 124 jen : 


Let’s redefine its determinant as a new (complex) number: sum of products of 
elements. Each product multiplies n elements: one element from each row, and one 
element from each column: 


det(A) = = €(P)A1, p(1) 42, p(2) 43, p(B) ** * An, p(n): 
peP 


Why is this the same as the original definition in Chap. 2, Sect.2.1.1? To see this, use 
mathematical induction on n, and use the union in Sect. 14.5.4 to mirror the minors 
in the original definition. 


14.6.2 Determinant of the Transpose 


We can now see why the transpose matrix has the same determinant. Indeed, each 
permutation is mirrored by its unique inverse. Furthermore, if it is even (odd), then 
its inverse is even (odd) as well. Indeed, 


-1 -1 
1=e(pp™') = e(p)e(p"'). 
Thus, instead of scanning all permutations one by one, scan all inverse permutations: 
det(A) =D) e(p)ar, pc1y42,02)43, p03) ** An, pin) 
peP 


= D5 ep) a1, 9142, p2)43,p@) ** Bn, pin) 
p-'eP 
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-1 
> pa e (p ) Ap-1(1),14p-1(2),24p-1(3),3° 1 Ap-l(n),n 
p'eP 


= ye € (P) Apt), 14 p(2),24p(3).3 ** * Apin).n 
peP 


= det (A’). 


Let’s use this to calculate the determinant of a product of two matrices. 


14.6.3 Determinant of a Product 


Consider now two (complex) matrices of order n: 


A= Ca) eres ee (5:.1) 


ISi,jsn* 


What is the determinant of AB? In its new definition, it is just 


det(AB) = ) “e(p)(AB)1, p(1)(AB)2,p2)°*+ (AB)n. pan: 
peP 


Inside the sum, we have a product of n factors of the form 


n 


(AB)i,na = | ai, jbi. pa 


j=l 


The above product scans i = 1, 2,3,...,n, and multiplies these factors one by one. 
Upon opening parentheses, one must pick one particular j from each such factor. 
Which j to pick? Well, there is no point to pick the same j from two different factors, 
say the ith and kth factors. After all, the resulting product will be soon canceled with 
a similar product, obtained from a permutation of the form 


(Gi > k)p, 


which mirrors p: it is nearly the same as p, but also switches i and k on top, picking 
an extra minus sign on top. 
So, we better focus on a more relevant option: pick a different j from each factor, 
say 
J=4), 


for some permutation g € P. This way, 
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det(AB) 


D5 e(P)a1,q0ybq(a), p11) 42,4(2)242), p12) *** Ang ny Bain), plo) 
p.qgeP 


Yo e(P)aq-1(1),191, pq1 4) 4q742),202, pq) “+ @q~1(n),n Pn, pgm) 
p.q-'eP 


V5 e(P)4q4),1P1,p4(1)4q2),202, pqQ2) ** + Agin).nPn, pain) 
p.qeP 


> e(1 )e(q)q 1,18 1,71) 4q(2),292,r(2) ++ * Ag(n).nPn,r(n) 
r,qeP 


= det (A‘) det(B) 
= det(A) det(B). 


In summary, the determinant of the product is indeed the product of the determinants: 
det(A B) = det(A) det(B). 


Let’s use this result further. 


14.6.4 Orthogonal and Unitary Matrix 


As a result, if O is an orthogonal matrix, then its determinant is either 1 or —1: 
1 = det (0'0) = det (0') det(O) = (det(O))° . 


Furthermore, the determinant of the Hermitian adjoint is the complex conjugate of 
the original determinant: 


det (A") = det (A‘) = det (A‘) = det(A). 


As a result, if U is a unitary matrix, then its determinant is a complex number of 
absolute value 1: 


1 = det (U"U) = det (U") det(U) = | det(U)|’. 


This will be useful below. 
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14.6.5 The Characteristic Polynomial 


Let’s write the characteristic polynomial as 
det(A — AD) = got qiAtQX +---+ 4nd | +e" 


(Chap. 3, Sect.3.1.1). What are these coefficients? Well, let’s start with the leading 
term: g, X". What is the coefficient g,,? To tell this, look at this determinant in terms of 
the new definition. Only one permutation contributes to \”: the identity permutation. 
Indeed, it produces the product 


(41,1 — A) (a2,2 — A) (43,3 — A) +++ (Gan — A). 


Upon opening parentheses, pick —\ from each factor. This produces the leading 
term: 
InN" = (-1)"\". 


In summary, 
dn = (- 1)”. 


Next, what is g,_;? Again, only the identity permutation contributes to \”~!. (All 
others contribute to \”~? at most.) As discussed above, it produces the product 


(a1 _ ) (a2,2 _ d) (a3,3 - ) tee (nin _ d) ; 


Upon opening parentheses, pick —A from most factors. Only from one factor don’t 
pick —\. This can be done in n different ways, producing 


gain = en) Yo aii. 


i=1 
In summary, 


dni = (-1)"" Dai, = (— 1)” “"trace(A), 


i=1 


where the trace of a matrix is the sum of its main-diagonal elements. 

By now, we’ve already uncovered two coefficients in the characteristic polyno- 
mial. Finally, what is go? Again, look at det(A — A/) in its new definition. This time, 
however, look at all permutations. Each permutation produces a product of n factors. 
From these factors, never pick —A. After all, this is the only way to contribute to qo. 
The result is 

qo = det(A). 
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14.6.6 Eigenvalues and Trace 


How to use the above in practice? For this purpose, we must write the characteristic 
polynomial in a new form: not as a sum but as a product. Fortunately, it has degree 
n, so it has n (complex) roots: 


At, A2, AZ, «++5 Ane 
At these 4;’s, the characteristic polynomial vanishes. (Some of them may be the 


same, but this doesn’t matter.) These are indeed the eigenvalues of A. Thanks to 
them, the characteristic polynomial can also be written as 


det(A — AT) = (Ay — AYO2 — AYO3 — A) On — A). 
This way, by setting \ = A; (1 < i < n), we indeed obtain zero, as required. 
Furthermore, the leading term is indeed (—1)”\”, as required. 


In the latter form, let’s open parentheses, and pick —A from n — 1 factors. This 
way, we obtain Gn! in a new form: 


eae a = ey! oe Ne 


i=1 


Thus, the trace is also the sum of eigenvalues: 
n 
trace(A) = PS ri. 
i=1 


Finally, upon opening parentheses in the latter formula, never pick —A from any 
factor. This way, we obtain go in a new form 


go = AiA2A3+++ An. 
In summary, the determinant is also the product of eigenvalues: 


det(A) = AyA2A3-°+ An. 
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14.7. Orbitals and Their Canonical From 


14.7.1 The Overlap Matrix and Its Diagonal From 


Let’s use the above in the context of functions, defined in the three-dimensional Carte- 
sian space. The overlap of two functions is obtained by integrating them against one 
another. This can be viewed as an extension of the inner product, defined in Chap. 1, 
Sect. 1.7.2. This way, we can talk about orthogonality, and even orthonormality. 

In particular, once this is calculated for every two orbitals, we obtain the L x L 
overlap matrix: 


O= (Ons exec = (ff [ evmearayac) a, 


(Although we use the notation “O”, this is not necessarily an orthogonal matrix!) 
A proper orbital should have norm 1: overlap | with itself. This way, it makes 
a proper probability function. Still, by now, we don’t have this property as yet: the 
main-diagonal elements O;,; may still be different from 1. 
Fortunately, O is Hermitian and positive semidefinite (Chap. 1, Sect. 1.12.1). As 
such, it can be diagonalized by a unitary matrix U, independent of r: 


O=U"DU, 


where 
D = diag (D,1, D2, ..., Dr,r) 


is a diagonal matrix, with the eigenvalues of O on its main diagonal: 

Dyn = 90, l<n<L. 
Often, the orbitals are linearly independent of each other: they have no linear combi- 
nation that vanishes (almost) everywhere. In this case, O is not only positive semidef- 
inite but also positive definite—its eigenvalues are strictly positive: 

Din >90, L<n<L. 


In this case, the overlap matrix has a positive determinant: 


det(O) = det(D) = D1 D2 aos Dirt > 0. 
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14.7.2 Unitary Transformation 


Let’s use the unitary matrix U to transform the original orbitals to the new orbitals 


What is the overlap matrix of these new orbitals? Let’s look at its (7, n)th element. 
Since U is independent of r, this element takes the form 


Ty L 
i i) / i (ru (r)dxdydz = i / / SoU, BPC) D2 On gv (r)dxdydz 
j=l k=1 


Since D is diagonal, the new orbitals are orthogonal to each other: they have zero 
overlap with each other. Let’s go ahead and use this property. 


14.7.3 Slater Determinant and Its Overlap 


What is the Slater determinant? Well, like every determinant, it is the sum of L! 
different products, using L! different permutations of 1, 2,..., L. This determinant 
is insensitive to the above unitary transformation. Indeed, once written in terms of 
the new orbitals, it just picks a complex factor of absolute value 1: det(U). This has 
no effect on its overlap with itself: 


BLE LL LC? O) ince) 
sae 
Paes 


= Di1D2,.2---DriL. 


2 
dx\dy\dz,---dxidyz.dzy 


2 
det ((o (ri)) a)| dx\dy\dz,---dxzidyzidzzy 
1<i,n<L 


det (ura) ee a 


2 
dx,dy\dz,---dxpdyzdzp 
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Why is the latter formula correct? Well, what do we have in the integrand? Just the 
complex conjugate of the determinant, times the determinant itself. In both, the same 
permutation must be picked, or there would be no contribution at all (thanks to the 
orthogonality of the transformed orbitals). Once the same permutation is picked, it 
has no effect: it just interchanges the dummy variables integrated upon. Thanks to 
the normalization factor 1/L!, the above formula indeed holds. 


14.7.4. The Canonical From 


So, it is convenient to work with the transformed orbitals, which have a diagonal 
overlap matrix. In other words, they are orthogonal to each other. If they are also 
linearly independent of each other, then they can also be normalized: 


uD Dey 


This is indeed their canonical form. In it, they are also orthonormal. In other words, 
their overlap matrix is just the identity matrix. 

We now have a simple algorithm to normalize the original Slater determinant. 
First, calculate the original overlap matrix. Then, diagonalize it. Then, use the unitary 
matrix to transform the orbitals. Finally, normalize the transformed orbitals, to obtain 
their canonical form. In terms of these final orbitals, the new Slater determinant indeed 
has overlap | with itself, as required. This is why the canonical form is so useful. 

Later on, we’ll make sure to have the canonical form automatically for free, with 
no need to use this algorithm. Therefore, we can assume that the orbitals are already 
in their canonical form. This will help simplify the expected energy considerably. 
Later on, we’ll see that this assumption is indeed plausible. 


14.8 Expected Energy 


14.8.1 Coulomb and Exchange Integrals 


Yet another Slater determinant can also be defined for the M — L latter orbitals of 
the remaining spin-down electrons. In summary, our up-to-date wave function takes 
the form of a product of two Slater determinants: 


w(r},12,..-,m) 


det (™ CD) irate) a det ((v CD) zip) : 


al- 
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With this new wave function, what is the expected potential energy? Fortunately, it 
is much simpler than the general form in Sect. 14.2.2. In fact, it is nearly as simple as 
in Sect. 14.3.2: it contains just one more double sum of new integrals—the exchange 
integrals. 

To see this, as discussed above, assume that v"!), v‘ ,v% are already 
in their canonical form: orthonormal in terms of overlap. Likewise, assume that 
vt) yt?) y are orthonormal as well. Later on, we’ll make sure that this 
is indeed the case. 

In the expected energy, we often integrate on the probability function |w|?. It is 
sometimes more convenient to write it as 


2) 


|w|? = w-w. 


This way, the potential due to attraction to the nucleus simplifies to read 


-| / [- f i / >a cdxidyidzideadyadea --dxmdymdzm 
: 1 
“ff fof] [aX mtnandaadsatadensdsudraden 
Sof f fae Lmasayae 
rll 


n=1 


Why is this correct? Well, let’s focus on the former Slater determinant in w. Like 
every determinant, it is just the sum of products, each uses a different permutation 
of 1,2,..., L. Now, to have a nonzero integral, one must pick the same permutation 
in w as in w. (Otherwise, thanks to orthogonality, there would be no contribution at 
all.) Once the same permutation is picked, it has no effect: it just interchanges the 
dummy variables integrated upon. Thanks to the normalization factor 1//L!, the 
above formula indeed holds. 

In the Coulomb integrals, on the other hand, things are not so simple any more. To 
have a nonzero contribution, one has two options. In the main option, pick the same 
permutation in w as in w. This will produce new Coulomb integrals of the form 


> > LLL [fe Or a ——_|y (7) ?dxdydzdkdydi. 
i=1 n=i+1 r|l 


Still, this is not the only option: in w, one could also pick a slightly different per- 
mutation, in which i and n switch on top (1 < i,n < L,or L < i,n < M),. For 
example, if p is picked in w, then pick 


(n > i)p 
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in w. If p is even (odd), then (n — 7) p is odd (even): 


e((n > 1)p) =e ((n > i))e(p) = —e(p). 


This will produce the so-called exchange integrals, with a minus sign: 


_y x i / i / / fi OW OT =r v™ ®)v™ (r)dxdydzdkd5di. 


i=l1n>i same spin 


Let’s see what this means for each individual orbital. 


14.8.2 Effective Potential Energy 


What does this mean for an individual electron? In other words, what is the effective 
potential that the nth electron feels? Well, it feels attraction to the nucleus: 


1 
-f f [0S veasayas, 
IIr| 


On top of this, it also feels repulsion from all other electrons: 


Hef fff fforore oR” Olaxaydedsayae. 


Here, one may ask: does it feel any repulsion from itself? No, it doesn’t. Still, there 
is one fictitious term in this sum: the term for which i = n. Don’t worry: it will drop 
soon. 

On top of this, it also feels the exchange force from all other electrons of the same 
spin: 


- - Piped ee Oe a 5 Fv (W)dxdydzdid5ai. 


i, same spin as n 


Here, one may ask: does it feel any exchange force from itself? No, it doesn’t. There 
is one fictitious term in the above sum: the term for which i = n. Fortunately, it 
cancels the fictitious term introduced above. 


14.8.3 Kinetic Energy 


On top of this, it also has its own kinetic energy: 
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1 
sf ff vm -vomasayac. 


Together, all these terms must sum to its expected energy: 


Ef f [w™Paxdyas, 


where E is a constant energy level: an eigenvalue of the Hamiltonian. This is indeed 
quantum mechanics: energy comes in discrete quantities. Only these energy levels 
are allowed. 


14.9 The Hartree-Fock System 


14.9.1 Basis Functions—The Coefficient Matrix 


So far, our orbital has been a function in the three-dimensional Cartesian space. This 
is too general. To help uncover the orbital, we must approximate it by piecewise- 
polynomial functions. More precisely, as in Chap. 13, let’s write it as a linear com- 
bination of basis functions: 
K 
v= rej, 
j=l 


where the 7;’s are the basis functions in the mesh, and the c;’s are their (unknown) 
complex coefficients. Let’s plug this in the effective energy in Sect. 14.8.2, term by 
term. For this purpose, in each term, replace 0 by 7, and v™ by ~;. This will 
assemble the (/, 7)th element in the coefficient matrix A: 


j 
1 
— —_y;dxdyd 
ff fou ae 

M I 
+f ff ff [ror 2 ej @asardcdzasaz 

i=1 
- > [ff ff feoee Wiley (Mdxdydzdd di 

r-r . 


i, same spin as n 


+ 5] f [v1 vejaxayas 


(1 < 1,j < K). This way, A is no longer a constant matrix. On the contrary: 
it depends on the orbitals— the unknown v“’s in the above sums. Still, for fixed 
orbitals, A is Hermitian, as required. 


a], 


14.9 The Hartree-Fock System 399 


14.9.2. The Mass Matrix 


Now, define also the mass matrix B. Its (J, j)th element is just the overlap of v with 


bf ff evjardyas 
(<1, j<K). 


How to solve for the unknown c;’s? For this purpose, place them in a new 
K-dimensional vector: 


yy: 


c= (cl, C2, C3, ..., Cx)’. 


This way, we can now plug our discrete approximation in. The effective energy in 
Sects. 14.8.2-14.8.3 takes now the discrete form 


c' Ac = Ec' Be, 


where E is the (unknown) energy level. Thus, this is a nonlinear equation, with two 
types of unknowns: the vector c, and the scalar E. 


14.9.3 Pseudo-eigenvalue Problem 


How to make sure that the orbitals are indeed in their canonical form? In other words, 
how to make sure that same-spin orbitals are indeed orthonormal in terms of overlap? 
Fortunately, same-spin orbitals solve the same (generalized) eigenvalue problem, 
with the same (Hermitian) coefficient matrix A. Their (generalized) eigenvalues are 
different from each other: their distinct energy levels. Thanks to the exercises at the 
end of Chap. |, same-spin orbitals are indeed orthogonal to each other: they have zero 
overlap with each other. Once normalized properly, they are indeed in their canonical 
form, as assumed all along. 

The mass matrix B is the same for all orbitals. The coefficient matrix A, on 
the other hand, is not: it depends on the orbitals, and therefore has two forms. For 
1 <n < L (orbital of a spin-up electron), A comes in one form. For L <n < M 
(orbital of a spin-down electron), on the other hand, A has a different form. Still, 
only same-spin orbitals should be orthogonal to each other. Fortunately, they share 
the same A, and solve the same pseudo-eigenvalue problem. 

What should the energy level E be? Well, it should be minimal. For this purpose, 
we need to solve a pseudo-eigenvalue problem: 


Ac= EBce. 
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The term “pseudo” reminds us that this is actually a nonlinear system: A depends on 
the unknown orbitals. Still, for fixed orbitals (say the solutions), A is Hermitian, as 
required. 

Furthermore, on the right-hand side, we have yet another symmetric matrix: B. 
Thus, this is a generalized eigenvalue problem. Thanks to the exercises at the end 
of Chap. 1, its (generalized) eigenvectors indeed produce orthogonal orbitals of zero 
overlap with each other. Once normalized properly, they are indeed in their canonical 
form, as required. 


14.10 Exercises 


14.10.1_ Permutation—Product of Switches 


1. For some 1 < i < k <n, consider a switch of the form (i — k). What does it 
do? Hint: at the same time, i maps to k, and k maps back to i. 
2. Show that it is symmetric: 


(i> kb =(k> i). 


What is its inverse? Hint: itself. 
3. Consider a cycle of the form 


[kK> iJ=(kKoOk-15k-2-5.---73i4+1- i). 


What does it do? Hint: at the same time, k maps to k — 1, k — 1 maps tok — 2, 
...,2 + 1 maps toi, andi maps back to k. 

4. Is it symmetric as well? Hint: only ifk =i-+ 1. 

5. Write it as a product (composition) of switches: 


ee ee ee ee ee) eer ce ee 


6. What is its inverse? Hint: 
[ko i) = -i4+DG417i4+2)::-&-13 HD =[i> Xe. 
7. Write the original switch (i — k) as a composition of two such cycles. Hint: 
G>k)=[i>k—ljo[k> i]. 
8. Conclude once again that the original switch is odd. Hint: 


e(i > k=1jo[k>i)He(Gisk-hedksip=Ci., 
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9. 


10. 


11. 
12. 


13. 


14. 
15. 
16. 
17. 


Consider now a more general 3-cycle: 
—>k-—- i). 


What does it do? Hint: at the same time, / maps to k, k maps to i, and i maps 
back to 1. 
Write it as a product of two switches. Hint 


I>k> i= > Dk iD). 


Conclude that it is even. 
Consider now a general permutation 


peP. 


Write it as a product of general cycles. Hint: start from 1. It must map to some 
number, which must map to some other number, and so on, until returning back to 
1. This completes one general cycle. The rest is a disjoint (smaller) permutation, 
which can benefit from an induction hypothesis. 

Conclude that every permutation can be written as a product of general cycles, 
each written as a product of switches, each written as a product of two more 
elementary cycles, each written as a product of most elementary switches, as 
above. 

What can you say about the determinant of the transpose? 

What can you say about the determinant of the Hermitian adjoint? 

What can you say about the determinant of an orthogonal matrix? 

What can you say about the determinant of a unitary matrix? 


14.10.2. How to Have the Canonical Form? 


WwW 


. Consider the unitary transformation in Sect. 14.7.2. How does it affect the Slater 


determinant? Hint: it multiplies it by det(U"). 


. Does this affect the absolute value of the Slater determinant? Hint: from 


Sect. 14.6.4, 
| det(U)| = | det (U") | = 1. 


. Consider the original overlap matrix O (Sect. 14.7.1). Is it Hermitian? 
. Is it positive semidefinite? 
. Is it positive definite? Hint: only if the original orbitals are linearly independent 


of each other: they have no linear combination that vanishes (almost) every- 
where. 


. Show that 


det(O) = det(D) = D, ;D2.2--+ Din. 
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7. 


8. 


9. 


10. 


11. 


12. 


13. 


14. 


15. 
16. 


17. 
18. 
19. 


20. 


21. 
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Hint: O and D share the same characteristic polynomial and determinant. 

Is this positive? Hint: only if the original orbitals are linearly independent of 
each other: they have no linear combination that vanishes (almost) everywhere. 
How to normalize the original Slater determinant without calculating the unitary 
transformation explicitly? Hint: just divide by ./det(O). 

Conclude that there is no need to transform explicitly: the original Slater deter- 
minant could have been normalized by the square root of the determinant of the 
original overlap matrix. 

Describe the algorithm to have the canonical form. 

Is it necessary? 

Why is the canonical form good? Hint: it helps simplify the integrals in the 
expected (potential) energy. 

Show that same-spin orbitals are eigenfunctions of the same (generalized) eigen- 
value problem, with the same (Hermitian) coefficient matrix, and the same mass 
matrix on the right-hand side. This way, if they have different (generalized) 
eigenvalues (energy levels), then they are indeed orthogonal to each other: have 
zero overlap with each other. Once normalized properly, they are indeed in their 
canonical form, as assumed all along. Hint: see exercises below. 

Show that the coefficient matrix A in Sect. 14.9.1 is indeed Hermitian, provided 
that the orbitals in the integrand are fixed. 

Show that the mass matrix in Sect. 14.9.2 is indeed symmetric. 

Look at the pseudo-eigenvalue problem in Sect. 14.9.3. In what sense is it 
“pseudo?” 

In what sense is it “generalized?” 

Look at two (generalized) eigenvectors c of different (generalized) eigenvalues 
E (energy levels). In what sense are they orthogonal to each other? Hint: see 
exercises at the end of Chap. 1. 

Conclude that the orbitals formed from them have zero overlap with each other. 
Conclude that, once normalized properly, these orbitals are indeed in their canon- 
ical form. 

Conclude that, in retrospect, it was indeed plausible to simplify the Coulomb 
and exchange integrals in Sect. 14.8.1. 


Chapter 15 ®) 
General Relativity: Einstein Equations cree 


Here is an interesting application in general relativity: Einstein equations. To intro- 
duce them, we must use new features in linear algebra: tensors, and algebraic opera- 
tions between them. For this purpose, we must introduce a new principle: Einstein’s 
summation convention. It improves on the standard sums used in linear algebra. In 
fact, it tells us how to raise and lower indices, and come up with a coherent summa- 
tion strategy. Thanks to it, the nonlinear system of equations gets particularly easy 
to introduce. 

Indeed, thanks to tensors, we can model the curvature in spacetime. The key to 
the curvature is the metric that tells us how spacetime stretches, and in what direction 
[2, 6, 12, 17, 64]. 

Unfortunately, the metric is not yet known. Fortunately, Einstein equations tell us 
an important thing about it: it can never come from nothing. On the contrary: it must 
be produced from its physical source: the stress (energy momentum) tensor. This 
gives us a new system of equations for the unknown metric. To solve it numerically, 
expand the metric in terms of basis functions in space, and march in time. 


15.1 General Relativity—Some Background 


15.1.1 Flat Versus Curved Geometry 


In special relativity, we assumed no gravity at all. This is why a particle often flies 
undisturbed at a constant speed, and never accelerates or changes direction. Only at 
the end of Chap.4 did we see a force that acts on the particle. Still, even there, we 
only considered the initial time t = 0, before the force had any time to act. 

In real systems such as the solar system, on the other hand, gravity can no longer 
be ignored. On the contrary: it must be explained by a new mathematical model, 
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independent of the coordinates that happen to be used. This is done in a new field: 
general relativity. Here is some historical background. 

The ancient Greeks introduced Euclidean geometry for one main purpose: to 
model static shapes in the two-dimensional plane—triangles, circles, and so on. 
Later on, this theory was also extended to the three-dimensional space. This was 
quite useful to calculate volume, surface area, and more. 

Newton, on the other hand, introduced a new time axis on top, to help model not 
only static but also dynamic shapes. This was indeed a breakthrough: a new force 
can now be applied to the original shape from the outside, to accelerate its original 
velocity, and even change its direction. 

This fits well in Plato’s philosophy. To refer to a geometrical shape (or just any 
general object), we must introduce a new word in our language, to represent not only 
one concrete instance but also the “godly” spirit behind all possible instances. This 
way, the word stands behind the concept it describes. Likewise, in physics, force 
stands behind the motion, and affects it from the outside. 

Newton viewed time as an external parameter, which makes a new (nonphysical) 
axis, orthogonal to the (physical) phase space, where the original motion takes place. 
Einstein, on the other hand, threw the time dimension back into the very heart of 
geometry. This way, time is not different from any other spatial dimension. Once 
the time axis is united with the original three-dimensional space, we have a new 
four-dimensional manifold: spacetime. 

This is more in the spirit of Aristotle’s philosophy. A word in our language takes 
its meaning not from the outside but from the very inside: the deep nature of the 
general object it stands for. 

In particular, energy and momentum are now not only physical but also geomet- 
rical: sources of mass, gravity, curvature, and symmetry. They determine the true 
metric in spacetime, telling us what a straight line really is: the shortest path between 
two events in spacetime. 

Still, because spacetime is a curved manifold, such a line may no longer look 
straight in the usual sense. After all, the time axis is no longer straight: the time scale 
may change from place to place. Near a massive star, for example, time may get 
slower. 


15.1.2. Gravitational Time Dilation 


This is called gravitational time dilation. The light coming from a massive star has 
a constant speed: c. Still, near the star, less seconds have passed than here, so the 
light made a shorter distance than here. This may also be called gravitational length 
contraction: to us, distances near the star measure shorter than they’d measure if 
they were here. After all, less seconds have passed there than here. (Compare with 
Chap. 4, Sects. 4.3.44.3.6.) 
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15.1.3. Gravitational Redshift 


During this time, light not only travels, but also oscillates like a wave, at a constant 
frequency. Still, less seconds have passed there than here, so the light wave had 
less time to oscillate, and made fewer cycles than here. This is called gravitational 
redshift: to us, the light coming from the star seems redder—less frequent. Thus, in 
our eyes, the star may look redder than it really is. A very massive star may even 
look so red that it could hardly be seen at all. 

What happens to a light ray that passes by a massive star? Well, as it gets closer 
and closer to the star, its time gets slower, so it makes a shorter distance than before. 
So, it must curve a little, and make a C-shape around the star. Around a star as massive 
and dense as a black hole, it can even spiral, and eventually “fall” right into the black 
hole, so we can never see it anymore! 


15.1.4 “Straight” Line 


In terms of our curved geometry in spacetime, this is still considered as a “straight” 
line. Indeed, in spacetime, to be straight means to follow a “valley” where time is 
as slow as possible. In such a valley, the light ray mostly remains in the slow-time 
region, saving time, and becoming “short” in spacetime. In terms of its own private 
(proper) time, this is indeed the fastest way. 

From our perspective, such a light ray may seem curved. From its own perspective, 
on the other hand, the light ray is as straight as ever. After all, its own proper time, 
measured in its own clock, remains as fast as before. Only our clock measures 
different time scales: slower near the star than here. 

A black hole is a special kind of star—so massive and dense that even light can’t 
escape from its gravitational force. Still, a light ray that approaches it would “feel” 
nothing unusual. Why? Because its own proper time still ticks at the same rate as 
before. 

After all, the light ray remains in a free fall, feeling no new force at all. Although 
gravity acts upon it quite strongly, this affects it only in the eyes of an outer observer: 
he/she will indeed observe that the light ray accelerates and gains more and more 
momentum and kinetic energy on its way to the black hole. From its own self-system, 
on the other hand, the light ray feels much calmer: there is no force or acceleration 
or any change to its momentum or kinetic energy or time rate. 
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15.2 Metric in Spacetime 


15.2.1 Spacetime 


Where does the motion take place? In classical mechanics, it takes place in space. In 
general relativity, on the other hand, it takes place in spacetime: together with the time 
dimension, this is a four-dimensional manifold. Each individual point in spacetime 
is called an event: a four-dimensional vector, specifying not only the spatial location 
but also the time. 

What coordinates to use in spacetime? For this purpose, we need four coordinates. 
In our lab, we already have our own coordinates: t, x, y, and z. For the sake of 
uniformity, these are often denoted by x° = t, x! = x, x? = y, and x? = z. These 
are often indexed by a small Greek letter, say a = 0, 1, 2, 3. 


15.2.2. The Unknown Metric 


In each individual event in spacetime, we have a metric g, telling us how spacetime 
“stretches” at this event. In fact, g is a4 x 4 matrix, depending on the event under 
consideration. Its eigenvectors tell us in what directions spacetime stretches at each 
individual event (Chap. 5, Sect.5.8.10). 

The metric g is not yet available. In fact, each entry (element) is an unknown func- 
tion in spacetime. In total, there are as many as 16 unknown entries. Fortunately, this 
number can be reduced: since g is symmetric, there are actually just ten independent 
entries: say, those in the upper triangular part. This way, we only seek those entries 
Jm.n indexed by 0 < m <n <3. 


15.2.3. Minkowski Metric and Riemann Normal Coordinates 


What do we know in advance about the unknown metric g? Well, Einstein’s equiva- 
lence principle says that, at each individual event in spacetime, g could be approxi- 
mated locally by a constant (flat) metric: the hyperbolic Minkowski metric. Although 
this local approximation remains implicit and theoretical, it still tells us how smooth 
g really is. 

What does this mean geometrically? Well, it means that, at each particular event 
in spacetime, g defines a smooth three-dimensional manifold that not only contains 
the event but also has a tangent hyperboloid at the event (Chap. 6, Sects. 6.6.1—6.7.2). 

The Minkowski metric has a simple form: for some unknown coordinates (not 
necessarily our lab coordinates), it can be written as hyperbolic (Fig.4.5). In this 
sense, the unknown metric g is locally hyperbolic: at each event in spacetime, there 
are some unknown coordinates (known as Riemann’s normal coordinates) for which 
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the metric is (nearly) hyperbolic. Unfortunately, Riemann’s normal coordinates have 
only a theoretical value: they remain local and implicit, and can never be “tied” to 
form a useful global coordinate system. 


15.2.4 Gravity Waves 


There is, though, one exception. To detect gravity waves, we do assume that the 
true metric g is close enough to the Minkowski metric. This way, the Riemann 
normal coordinates may indeed take a more practical and global form. For this 
purpose, however, we must transform our original lab coordinates to more convenient 
coordinates. This is known as gauge transformation. Fortunately, there is no need to 
do this explicitly. Instead, it is sufficient to assume that the gauge conditions hold. 
This way, the original Einstein equations split into ten decoupled wave equations for 
the 10 unknown functions gy, (0 < m <n < 3). 

Why not do this here? Because here we seek the true metric g, which (globally) 
may be completely different from the Minkowski metric. 


15.3 Symbols 


15.3.1 The Gradient Symbol 


Einstein equations contain all the information about the (yet unknown) metric g. 
Once they are solved, g is uncovered, as required. 

To introduce them, let’s differentiate the entries in g with respect to our four 
coordinates. In fact, each entry has four partial derivatives. This makes the “gradient” 
of g, which is still unknown as well. Let’s place it in a new 4 x 4 x 4 symbol, indexed 
by three (lower) indices—m,n, a = 0, 1, 2, 3: 


Ogmn 


Ynn,a = Ox 


(0 <m,n,a <3). 


Here, mn actually means “m, n”. We often drop the comma. Only in the gradient do 
we keep the comma: “, a”’ means a partial derivative with respect to x° (0 < a < 3). 
For example, for a = 0, this is a partial derivative with respect to t = x°. 

A symbol may be viewed as an extention of the concept of matrix: it may use more 
than two indices. In our context, it also depends on the event under consideration: it 
may change from event to event. Unlike the tensor introduced later, a symbol is not 


necessarily invariant under changing the coordinate system. 
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The convention is to use capital Greek letters to denote symbols like this. Let’s 
define yet another 4 x 4 x 4 symbol: 


(Conk + Yon,m ~ Tina) . 


Nile 


Oamn = 


This will be useful later. 


15.4 Einstein Summation Convention 


15.4.1 Lower and Upper Indices 


In linear algebra, we use lower indices only. This is good enough to denote vector 
components and matrix elements, and sum over any (lower) index in them. Here, on 
the other hand, we distinguish between lower and upper indices. This will help sum 
(or contract): an upper index could contract with a lower index only. 


15.4.2. The Inverse Metric 


The inverse metric is often denoted by two upper indices: 


py —1 
7 = (9 ) py * 
Here, we use a small Greek letter as an index: 0 < p,v < 3. Also, we often use 
Einstein summation convention: 


an index that appears twice, once as an upper index and another time as a lower 
index, is summed over (contracted). 


This way, no ‘>~’ sign is needed anymore. 
For example, in the identity matrix J, denote the individual elements by 


jt = 1 if w=k 
“10 if wAR. 


Thanks to Einstein summation convention, the sum of the main-diagonal elements 
has a new short form, with no ys 


3 3 
trace) = =) = YL =1t+1t+1+1=4. 
u=0 p=0 
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This way, the “)*” sign is avoided and dropped, with the same meaning. We then 
say that the upper and lower indices have been contracted with each other. We can 
now combine the above conventions, and multiply g by its own inverse from the left: 


3 3 
GOpy = > g" Gur = » (0) wv Gui = On 
v=0 


v=0 


This way, js remains an upper index, « remains a lower index, and v is gone, as 
required. In other words, the second (upper) index in g’” has been contracted with 
the first ower) index in g,,,;. 

An upper index can be contracted only with a lower index, but not with an upper 
index. This is why the above contraction is indeed legitimate. 


15.5 The Riemann Tensor 


15.5.1 A New Convention 


What does the above formula say? In terms of linear algebra, it simply says 


(y"'9) pK = Tun 


Still, this naive style uses lower indices only. Our new style, on the other hand, is 
safer: it distinguishes between lower and upper indices, and protects from a wrong 
contraction. 

Still, in the naive style, let’s introduce a new convention: drop the parentheses! 
This way, 


a Gus = 9" Gun. 
This way, applying g~! from the left means contracting with v, which is the first 


lower index in g. Likewise, g~! could be applied not only to g but also to any other 
matrix, and even to a bigger symbol like @. 


15.5.2 The Christoffel Symbol 


To get used to this new convention, let’s apply g~! not only to g but also to O. For 


this purpose, contract with the first lower index in ©. This defines a new symbol: 


> 22627 — af 
Dan =g Oamn =g9 Ogmn- 
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This raises a from a lower index in © to an upper index in J”. Indeed, the dummy 
index ( is contracted upon and gone. 

This is the Christoffel symbol. It is the key to the curvature of spacetime, stored 
in the Riemann tensor. 


15.5.3. The Riemann Tensor 


Let’s define a yet bigger 4 x 4 x 4 x 4 symbol, with four indices—an upper index, 
followed by three lower indices: 


Here, there are two terms. The first term indeed uses four indices, as required. In 
the second term, the dummy index is contracted upon and gone, leaving just four 
indices, as required. 

To define the Riemann tensor, just antisymmetrize & in terms of o and v: subtract 
the same, but with o and v interchanged: 


P =P _ FP 
Riot = Sov Swe: 


This is the key for the curvature in spacetime. 


15.6 Einstein Equations in Vaccum 


15.6.1 Vacuum and Curvature 


The Riemann tensor tells us the curvature: how curved spacetime is at each individual 
event in it. This curvature is shaped by the original mass distribution. It actually 
defines the true (nonflat) geometry in spacetime, and the true metric in it. This 
produces the desired system of equations, whose solution is the (still unknown) 
metric. 

For a start, let’s model gravity in vacuum: an empty domain, with no matter or 
energy or any other source of curvature. In this domain, the curvature should vanish. 

Of course, outside the domain, things may be different: there may be massive 
stars (like black holes) that may curve spacetime globally, both outside the domain 
and inside it. Still, they may affect our domain only indirectly: through boundary 
conditions. In the interior of the domain, on the other hand, there should be no 
curvature at all. 
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Does this mean that, in our domain, the Riemann tensor must vanish? No! If it 
were, then the metric would be constant, and spacetime would be completely flat: 
every free fall would follow a straight line, as in Minkowski space. 

This may be good enough in special relativity, where no gravity is assumed at 
all. In general relativity, on the other hand, gravity curves spacetime. As a result, 
a free fall may follow not a straight but a curved geodesic: the shortest path in the 
underlying (nonflat) geometry. 

For instance, this is indeed why the Earth orbits the Sun: in terms of the curved 
geometry in the solar system, it “falls” not in a straight line but in an elliptic orbit. 
Although this is not straight in the usual sense, it is perfectly straight in the curved 
spacetime. 

In this example, what is our domain? Well, it doesn’t contain the Sun itself, but 
only the space outside it, which is assumed to be completely empty. This is indeed a 
perfect vacuum: the Earth itself doesn’t count, because this is the object in motion, 
upon which gravity acts. 

So, to model gravity in general, the Riemann tensor mustn’t vanish. This would 
be too pedant—it would throw us back to special relativity, losing all the interesting 
effects of gravity. Surely, we wouldn’t like this to happen. To avoid this, something 
else should vanish: not the original Riemann tensor, but a smaller tensor—the Ricci 
tensor. 


15.6.2 The Ricci Tensor 


What is the Ricci tensor? It is just the “trace” of the Riemann tensor, obtained by 
contracting the upper index with the second lower index: 


= RP 
Ri = Ro 


In the original Riemann tensor, the upper index is followed by three lower indices. 
The above definition actually contracts the first and third indices in the original 
Riemann tensor. This is indeed legitimate: the first index is upper, and the third index 
is lower. This way, these indices are now gone: the resulting Ricci tensor takes just 
two indices. 


15.6.3 Einstein Equations in Vacuum 


In vacuum, the Ricci tensor must vanish: 


Ry =0. 
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Here, we better keep the indices yz and vy, and not drop them. This may help indicate 
that this is the Ricci tensor R,,,, not the original Riemann tensor, which takes four 
indices. 

Because R,,, depends on g, this indeed makes a system of equations in the 
unknown metric g. These are Einstein equations in vacuum. Once they are solved, g 
is indeed uncovered, as required. 

How to solve them numerically? Well, on the left-hand side, the tensor depends 
on g, which is not yet known. To solve for it numerically, expand it in terms of basis 
functions: 


K 
Gm.n = ry, 0 < m, n < 3. 
qi 


(This is a standard linear algebra sum: it uses no summation convention.) Once this 
is plugged in, multiply each equation by 7, and integrate in space. This gives 1OK 
equations in 10 K new unknowns: the que *s. Still, they may change in time. To solve 
for them, we must therefore march in time, time level by time level. For this purpose, 
the t-partial derivative should be discretized by an (implicit) finite difference: the 
current time level, minus the previous one, divided by the time step Ar. 

So far, we’ve considered Einstein equations in vacuum. This is good enough in 
the study of black holes and the solar system. After all, nobody is interested in the 
interior of a star, but only in the empty space outside it. 

But this is no longer good enough in cosmology. In this field, we can no longer 
assume that our domain is empty. On the contrary: we often use a very large scale—we 
average on the stars and galaxies in the entire universe, and consider them as a homo- 
geneous dust, with a uniform density, and no pressure at all. Thus, we can’t assume 
vacuum anymore: the universe is full of matter and energy. To model this, Einstein 
equations must also take a nonzero right-hand side: the stress (energy momentum) 
tensor. 


15.7 Einstein Equations—General Form 


15.7.1 The Stress (Energy Momentum) Tensor 


Cosmology is one example in which the vacuum can no longer be assumed. On the 
contrary: the right-hand side s now a nonzero tensor: the stress (energy momentum) 
tensor 7,,,, which often takes the simple form 


Tw =Vwt+PIw O< pv <3), 


where the scalar p and the tensor V may depend on g, g~', and the field itself in a 
simple way. For simplicity, we often drop the indices yu and v: 
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T=pg+V. 


This simple form appears in a few important models: 


1. 


2: 


In a scalar field, p is the Lagrangian: the kinetic energy minus the potential 
energy (as in a harmonic oscillator). This way, p depends linearly on g~!. 


In electromagnetics, on the other hand, 
1 
p= qreee (g°'Fg'F) and V=—Fg''F, 


where F is the antisymmetric tensor that stores the electric field (E,, FE, E3) 
and the magnetic field (B,, Bz, B3): 


O EF, EE E3 
—E, 0 —B3 By 
—E, Bz O —B, 
—E3—B, By O 


F 


This leads to the Einstein—Maxwell equations. 


. Finally, in cosmology, in a very large scale, the stars and galaxies are averaged, 


and viewed as a perfect fluid, with pressure p. (In Friedmann’s theory, in particu- 
lar, they are considered as dust, with no pressure at all: p = 0.) In this model, we 
also assume spatial isotropy: at each individual point in space, every direction 
looks the same. This way, V has just one nonzero entry: 


Ke pOO OG 

0 000 
Y=1 0 ooo]? 

0 000 


where p is the energy density. (Don’t confuse it with the index p used in the 
Riemann tensor above!) Moreover, we also assume symmetry and homogeneity: 
in every point in space, everything is the same: p = p(t) and p = p(t). This 
way, the universe must be a highly symmetric three-dimensional manifold: 


e Either a “closed” (compact) hypersphere, with a (constant) positive spatial 
curvature, 

e Or an “open” (noncompact) hyperboloid , with a negative curvature, 

e Ora flat (and open) Euclidean space, with no curvature at all. (From observa- 
tions, this option is most likely.) 


As the universe expands, the matter gets more and more spread out, and the energy 
density decreases: dp/dt < 0. In reality, however, the universe may contain not 
only dust but also other kinds of radiation, with a nonzero pressure, proportional 
to their own energy density. For example, even if the universe were completely 
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empty, it would still contain “vacuum” energy, with p = —p, dp/dt = dp/dt = 0, 
and V = 0. Because it has a constant p and p, this “dark” energy is also called 
the cosmological constant. Still, the above models are too ideal: in a more realistic 


cosmological model, the universe should better be a mix of dust, dark energy, and 
other kinds of radiation together. 


15.7.2. The Stress Tensor and Its Trace 


Now, let’s take J, and apply g7! to it from the left: 
g'T =pi+q'Vv. 
Here, J is the 4 x 4 identity matrix, so trace(/) = 4. Therefore, 
trace (g"'T) = 4p + trace (g7! Vv) i 
This scalar can now be multiplied by g,,, from the right, with two new indices—m 


and n: 
trace (g"'T) Imn = 4PGmn + trace (or? 4) Jmn- 


15.7.3 Ricci Scalar 


So far, we’ve considered a few kinds of tensors. The Riemann tensor takes four 
indices. The Ricci tensor, on the other hand, takes just two indices. The Ricci scalar, 
on the other hand, has no index at all: 


R = trace (g" Rip) ‘ 


15.7.4 Einstein Tensor 


We are now ready to define Einstein tensor: take the Ricci tensor, and subtract half 
of the Ricci scalar times g: 


1 
Guy = Rw = 3 R9w- 


(Don’t confuse G,,,, with Newton’s constant G, which has no index at all!) 
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15.7.5 Einstein Equations—General Form 


We are now ready to write Einstein equations in their general form, with a nonzero 
tensor on the right-hand side as well. This way, they are relevant not only in vacuum 
but also in matter or radiation, as required: 


Gy = 81GT,,. 


(Distinguish between G,,,, on the left, a tensor with two indices, and G on the right, 
which stands for Newton’s gravity constant, which takes no index at all.) 

What do we have in this system? On the right-hand side, the tensor is available. 
On the left-hand side, on the other hand, the tensor depends on g, which is not yet 
known. How to solve for it numerically? Again, expand it in space as 


K 
Inn = yoo by, O<m,n <3. 
j=! 


(Recall that this is a mere linear algebra sum, with no summation convention.) Now, 
plug this in, multiply each equation by 7, and integrate in space. This way, we have 


a discrete (nonlinear) system for the gs, Still, they may depend on time. Thus, 


to solve for them, we must march in time. For this purpose, discretize the ¢-partial 
derivative: say, take the current time level, subtract the previous one, and divide by 
At. 


15.7.6 The Trace-Subtracted Form 


In the final Einstein equations, apply g~! to both sides: 


-1 -1 1 -1 1 -1 
g Gu =9 Rw _ 5 RI =9 Rw — 5 Rw =9 Tw. 


Now, take the trace of both sides: 


<4 1 1 “4 
trace (9 Rw) _ a -trace(1) = R — ak -4= —R = 8G - trace (9 Tis) é 


In summary, we now have the Ricci scalar in a more explicit form 
= -1 
R= —8nG - trace (9 Tie) ; 


Let’s plug this in Einstein equations, and divide by 87G: 
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1 1 “4 
ang hm + gee (9 T) Gv = Ty. 
This is called the trace-subtracted form. 
What do we have in this system? On the right-hand side, we have a known tensor. 
On the left-hand side, on the other hand, the tensor depends on the metric g, which 
is still unknown. To solve for it, use the same numerical technique as before. 


15.8 Exercises 


1. Let A be a4 x 4 matrix: 


A= (Gis) ici oes _ 


Apply g7! to it from the left. For this purpose, contract with v, which is the first 
lower index in A: 


—1 UV 
G Quin = g' avn. 


Hint: the dummy index v is contracted upon and gone. This is our new convention 
in Sect. 15.5.1. 

2. Is this a legitimate contraction? Hint: v indeed appears twice—as an upper index 
in g’”, and also as a lower index in g,,,, as required. 

3. Show that this new style could actually be obtained from the naive linear algebra 
style: just introduce parentheses back again: 


3 3 
F due = (9A) = (GF) ow = Dg dow = gl de. 


v=0 v=0 


Hint: use Einstein summation convention, and write the inverse of g with two 
upper indices (Sect. 15.4.2). 

4. What is so good about this new style? Hint: thanks to it, g~" can now be applied 
not only to a matrix like g but also to a bigger symbol like ©, to help define the 
Christoffel symbol: 


1 


Din = OG Oss = 9°’ Opmn 
(Sect. 15.5.2). 

5. In the latter formula, what happened to the index (? Hint: it has been contracted 
upon and gone. This is why it can no longer be found in I’. 

6. Is this a legitimate contraction? Hint: 3 is indeed an upper index in g®’, and a 
lower index in ©, as required. 

7. Why is a an upper index in J”? Hint: because it is an upper index in g®’, and 
should remain an upper index in I" as well. 
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8. 


14. 


15. 


16. 


17. 


18. 


19. 


1 


Give yet another example of applying g~! in the new style. Hint: apply g~! to 


the Ricci tensor, and take the trace: 
R = trace (g7' Rw). 
This defines the Ricci scalar (Sect. 15.7.3). 


What information is stored in the Riemann tensor? Hint: the entire curvature in 
spacetime. 


. Does it make sense to require that it vanishes? Hint: this is too pedant—it would 


lead to a flat metric, which allows no gravity at all. 


. Does it make more sense to require that the Ricci tensor vanishes? Hint: yes—this 


gives Einstein equations in vacuum. 


. Give an example of a domain that has only vacuum in it. Hint: in the solar system, 


consider the space outside the Sun. 


. In this domain, doesn’t the Earth itself violate the vacuum assumption? Hint: it 


is considered as a mere point. After all, it is the object in motion, upon which 
gravity acts. 

In this domain, how could the Earth feel any gravity? Hint: it feels the Sun’s 
gravity through the boundary conditions at the Sun’s surface. 

Why does the Earth orbit the Sun? Hint: in the curved geometry in the solar 
system, the Earth’s orbit is considered as a “straight” line in spacetime. This 
way, the Earth actually “falls” freely along its orbit. In the curved geometry in 
spacetime, this is considered as a straight line. 

Look at a massive star in the sky. Does the time on it tick faster or slower than 
your own time here? Hint: slower. This is gravitational time dilation. 

Does the diameter of the star look shorter or longer than if the star were here and 
its diameter had been measured here? Hint: shorter. This is gravitational length 
contraction. 

Is the speed of light the same there as here? Hint: yes—the above effects cancel 
each other. On the star, due to gravitational time dilation, less seconds have passed 
than here. Still, due to gravitational length contraction, distances measure shorter 
on the star than they’d measure if they were here. 

What color does the star have? Hint: due to gravitational redshift, it looks redder 
than it would look here. 
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Moebius transformation, 119, 146, 167, 172, 


175 
as matrix, 148 
composition, 149, 176 
continuous-, 146 
inverse-, 147 
invertible-, 147 


not invertible-, 147 
pole, 147 
Moment of inertia, 82, 87 
Momentum, 232 
angular-, 67, 256 
conservation of, 85 
conservation of, 122 
linear-, 66 
matrix, 123, 236 
nondeterministic-, 258 
relative-, 124 
scalar-, 255 
tensor, 412 
vector-, 255 
Monomial 
computing, 271, 275 
Moon, 221 
Motion 
curved-, 219 
in computer graphics, 219 
Multidimensional 
hypersphere, 191 
vector, 7 
Multilevel 
hierarchy, 359 
refinement, 328 
refinement (see refinement), 359 
Multiplying 
2-d vector by scalar, 5 
matrices, 15, 150 
matrix by scalar, 13, 148 
matrix-vector, 13 
polynomial by scalar, 269 
polynomials, 269 
Multiscale 
decomposition, 35 


N 
Natural 
number, 274 
Neutron, 263 
Newton, 404 
constant, 415 
gravity constant, 415 
law of nature, 103 
mechanics, 103, 122, 231 
Nodal function, 369 
Nonconvex, 330 
Nondeterministic, 233 
energy, 251 
momentum, 258 
Nonsingular, 178 
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matrix, 1, 54, 143, 149, 167, 194, 228, 
374, 375 
Norm 
of vector, 23 
of vector product, 66 
Normal 
coordinate, 407 
derivative, 290 
high-order, 291 
order of-, 291 
second-, 291 
in computer graphics, 221 
in projective geometry, 206, 213, 221 
in rotation, 221 
in translation, 219 
of tangent plane, 206, 213 
subgroup, 163, 170 
vector, 290 
in angle, 349 
in mesh, 336 
Normalized, 24 
eigenvector, 29 
vector, 29 
Nuclear energy, 126, 129 
Null space, 29, 89, 246 
Number, 187 
complex-, 9 
natural-, 274 
operator, 244, 247 
wave-, 32 
Numerical 
analysis, 325 
application, 325 
approximation, 369 
integration, 325, 359 


oO 
Object 
abstract-, 145 
in Euclidean geometry, 187 
algebraic-, 1, 143, 145, 154, 187, 268 
analytic-, 145, 268 
geometrical-, 145, 187 
infinity-, 200 
in group, 154, 166 
Oblique 
plane, 169-171 
projection, 171, 202, 224 
Observable, 233, 239 
degenerate-, 259 
Off-diagonal, 19, 34 
One-to-one 
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homomorphism, 165 
isomorphism, 165 
mapping, 154 
Onto, 154 
Open 
half-, 43 
interval, 32, 36 
Operation (see: operator or arithmetic) 
inverse-, 155, 157 
union-, 170 
Operator 
bilinear-, 56 
differential-, 291 
differentiation-, 124 
ladder-, 245, 257 
number-, 244, 247 
Opposite points (see: antipodal), 188 
Orbital, 381 
canonical form, 395 
orthogonal-, 393, 395 
orthonormal-, 393, 395 
Order 
in search engine, 97 
in the internet, 97 
in the web, 97 
of matrix, 18 
of normal derivative, 291 
of partial derivative, 283, 293 
Origin, 7, 364 
in 2-d, 4 
in 3-d, 6 
of mapping, 153 
Orthogonal 
columns, 27 
decomposition, 68 
matrix, 28, 34, 38, 40, 222 
orbitals, 393, 395 
plane, 206 
projection, 57 
transformation, 62 
vectors, 26, 33, 36, 40, 206, 221, 290 
Orthonormal, 221 
columns, 27 
orbitals, 393, 395 
vector, 26, 33, 40 
Oscillator, 246 
Overlap 
matrix, 393 
Slater determinant, 394 


P 
Pair, 3, 285 
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Parabola, 212 
Paraboloid, 211 
Paradox 
twin-, 114 
Parallel 
in computer graphics, 219 
in projective geometry, 219 
in translation, 219 
line and plane, 292 
lines, 187, 209, 292 
Parallelogram, 137 
determinant, 53 
diagonal in-, 5 
in vector product, 66 
tule, 4, 83 
Parameter 
in analytic geometry, 189 
in computer graphics, 218-220 
in continued fraction, 176 
in derivative, 280, 286 
in matrix, 55, 146, 173 
in projective plane, 193 
in transformation, 146, 173 
Partial 
derivative, 280, 286, 363 
chain rule, 302 
high-order, 292 
mixed-, 283, 292 
order of-, 283, 293 
second-, 283, 294 
third-, 283 
Particle, 91 
elementary-, 261 
Path 
discrete-, 220, 221 
in relativity, 119 
Pauli, 384 
exclusion, 235, 264 
matrix, 263 
Perfect fluid, 413 
Period 
in continued fraction, 185 
Periodic 
continued fraction, 185 
matrix, 46 
Permutation, 384 
cycle, 385, 400 
group, 386 
number, 387 
product, 400 
switch, 384, 400 
Peron—Frobenius theory, 96 


Perpendicular (see: orthogonal, normal), 290 


force, 132 
velocity, 120 
Phase, 235 
space, 404 
Photon, 137, 235, 264 
Physics (see: mechanics) 
geometrical-, 66 
Planck, 238, 380 
Plane, 205 
Cartesian-, 4, 187, 188 
complex-, 193 
extended-, 146, 192 
decomposition, 168 
family, 169 
horizontal-, 169, 171 
hyper-, 225, 228 
tangent-, 228 
in projective space, 225, 227 
invariant-, 219 
oblique-, 169, 171 
orthogonal-, 206 
parallel-, 292 
projective- 
real-, 201 
set, 169 
tangent-, 206, 213, 227 
union of-, 170 
vertical-, 169 
Plato, 404 
Point 
antipodal-, 85, 189, 190, 207, 225 
fixed-, 175 
in 3-d, 6 
infinity-, 108, 146, 204, 207, 208 
in projective plane, 208, 210, 213 
in projective space, 225, 227 
joint-, 208, 210 
opposite- (see: antipodal), 188 
reference-, 360 
Poisson distribution, 252 
Polar 
angle, 362 
coordinates, 362 
decomposition, 195, 235, 296 
Polarization, 264 
Pole, 147 
Polynomial, 267 
adding, 268 
binary-, 274, 276 
characteristic-, 90, 391 
composition 
Horner algorithm, 273 
composition of, 273 
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computing, 271 
Horner algorithm, 273 
degree of, 285, 294 
linear-, 278 
multiplying, 269 
multiplying by scalar, 269 
of 2 variables, 279 
of 3 variables, 285 
quadratic-, 278 
root of, 90, 392 
Position, 232 
matrix, 234 
scalar-, 255 
super-, 236 
vector-, 255 
Positive 
definite, 39, 87, 374 
semidefinite, 38, 81 
Potential 
electrostatic-, 382 
energy, 382, 397, 413 
Potential energy, 126, 129 
Precession, 75 
Primitive function, 277 
Principal 
axis system, 81 
coordinates, 81 
Principle 
exclusion-, 235, 264 
Heisenberg-, 242 
uncertainty, 242 
Probability, 91 
matrix, 92, 94 
uniform, 92 
Problem 
ranking-, 97 
two-body, 85 
Process 
limit-, 178 
Product 
group-, 194 
Hartree-, 382 
inner-, 22, 25, 57, 290 
real-, 23, 205, 305 
matrix-, 178, 185 
matrix-vector, 150 
permutation-, 400 
scalar-, 23 
tensor-, 252 
vector-, 58, 66, 210, 226 
triple-, 83, 226 
Projection, 207, 215 
cotangent-, 171 


matrix, 28 

oblique-, 171, 202, 224 

orthogonal-, 57 

radial-, 203, 225 
Projective 

geometry, 187 

duality, 210, 215, 226 

line, 199 

mapping, 218, 305 

plane 

real-, 201 

space, 224, 305 

transformation, 218, 305 
Proof 

by contradiction, 30 
Proper time, 112, 134 
Proportion 

in vector, 5, 24 
Proton, 263 
Pythagoras 

theorem, 10, 68, 189 


Q 


Quadratic 
equation, 189 
function, 211 
polynomial, 278 
Quantum 
dynamics, 234, 250 
mechanics, 231, 414 
Quotient group (see: factor), 164 


R 
Radius, 189 
of ball, 364 
projection, 203, 225 
spectral-, 90, 94 
Random variable, 240, 382 
covariance, 241 
expectation, 240, 245 
variance, 241 
Ranking problem, 97 
Ratio 
ball-, 354 
in continued fraction, 178 
Real 
function, 205, 267 
inner product, 23, 205, 305 
matrix, 21, 201, 218, 224, 228 
projective 
line, 199 
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plane, 201 
space, 224, 305 
vector, 23 
Reciprocal, 376 
Recursion 
in polynomial, 271 
Recursive (see: recursion) 
algorithm, 274 
Redshift, 405 
Reference 
coordinate, 359 
point, 360 
Refinement 
boundary-, 334 
in 3-d, 328, 334, 359 
iterative-, 328, 359 
local-, 328 
multilevel-, 328, 359 
step, 328 
Reflexive, 159, 168 
Regularity, 329, 347, 355 
adequacy, 352 
estimate, 351 
mesh-, 329, 347, 355 
minimal angle, 348 
numerical results, 355 
of tetrahedron, 347 
Regular (see: nonsingular), 54 
Relation (see: equivalence) 
in a set, 159 
reflexive-, 159, 168 
symmetric-, 159, 168 
transitive-, 159, 168 
Relative, 231 
axis system, 63 
energy, 124 
momentum, 124 
time, 232 
Relativity 
general-, 403, 410 
special-, 103, 410 
Remainder, 70 
in sine decomposition, 35 
in steady state, 96 
Representation, 196 
binary-, 274 
decimal-, 274 
group-, 145, 149, 175, 187 
matrix- (see: group), 145 
Rest mass, 127, 130, 132 
Ricci 
scalar, 414 
tensor, 411 


Richardson, 366 
Riemann, 146 
coordinates, 407 
geometry, 207 
normal coordinates, 407 
tensor, 410 
Right-hand 
rule, 59, 341 
side, 272 
system, 70, 79 
Root 
of polynomial, 90, 392 
of unity, 42 
Rotation, 69, 77, 221 
axis system, 69 
coordinate, 71 
Earth, 74, 221 
group, 79 
inverse-, 78 
matrix, 63, 65, 78 
Row 
index, 270 
in matrix, 12, 15 
number, 15 
vector, 293 
Rule 
chain-, 302, 305, 307 
Cramer-, 55, 111, 137, 305 
Leibnitz-, 307 
parallelogram-, 4, 83 
right-hand-, 59, 341 
trapezoidal-, 278 


S 
Scalar, 4 
multiplying a matrix, 13, 148 
multiplying a polynomial, 269 
multiplying a vector, 5 
product, 23 
Ricci-, 414 
Scale, 35, 238 
Schmidt, 63, 339 
Schrodinger, 234 
Schur 
complement, 375 
matrix, 375 
Schwarz, 40 
Search 
engine, 97 
the internet, 97 
the web, 97 
Self 
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adjoint matrix (see: Hermitian), 22 
coordinate, 108 
system, 107 
time, 134 
Semicircle, 201 
Semispace, 202 
Sequence, 3 
Series 
Taylor, 296 
truncated-, 296 
Set 
decomposition, 160 
disjoint-, 160 
equivalence, 160 
in a group, 162 
relation, 159 
level-, 115, 205, 211, 329, 333 
mapping, 153 
of equivalence classes, 160 
of planes, 169 
subset of-, 96, 154, 163 
vector-, 168, 201 
Zero 
measure, 365 
volume, 365 
Side 
in tetrahedron, 291 
Sine 
decomposition, 35 
derivative, 296 
discrete-, 36 
mode, 31 
Taylor 
polynomial, 296 
series, 296 
transform, 34 
wave, 31 
Singular, 365 
matrix, 29, 53, 179 
Singularity, 361 
Skew-symmetric, 23 
Slater, 384, 394 
Solar system, 412 


Space 
Cartesian-, 6, 58, 190 
3-d, 3, 6, 58 


function-, 370 
linear-, 1, 4, 231, 265 
Minkowski-, 114 
null-, 29, 89, 246 
phase-, 404 
projective-, 224, 305 
semi-, 202 
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time, 103, 117, 404 
vector-, 3, 8, 143, 265 
Spacetime, 103, 117, 404 
metric, 404 
Special relativity, 103, 410 
Spectral, 35 
decomposition, 35, 95 
radius, 90, 94 
Spectrum, 90, 94 
Speed (see: velocity) 
of light, 105 
Sphere, 193, 231 
coordinates, 362 
divided-, 204 
general-, 191, 194 
hemi-, 196 
hyper-, 191, 196 
multidimensional-, 191 
Riemann-, 146 
unit-, 190 
Spin, 261 
Spline, 325, 369 
B-, 299 
Square 
matrix, 18, 22, 25, 28, 29 
Standing wave, 248, 250 
Star, 404 
State, 92, 235 
coherent-, 250 
dynamic-, 250 
ground-, 238, 248 
physical-, 233 
steady-, 96 
stochastic-, 92 
Steady state, 96 
Stiffness, 372 
Stochastic, 92, 231 
flow, 92 
state, 92 
Stress, 412 
Subgroup, 157, 194 
center-, 158, 168 
kernel-, 158, 165 
normal-, 163, 170 
Subset, 96, 154, 163 
Sun, 221 
Superposition, 236 
Surface, 282 
Switch, 384, 400 
Symmetric 
anti-, 410, 413 
Hessian, 294 
matrix, 18, 294, 374, 375 
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relation, 159, 168 
skew-, 23 
Symmetrization, 239 
System 
axis-, 211 
principal-, 81 
relative-, 63 
rotating-, 69 
closed-, 85, 126 
coordinate- 
relative-, 63 
isolated-, 85, 126 
lab-, 108 
passive-, 132 
right-hand, 70, 79 
self-, 107 


T 
Tangent 
derivative, 292 
hyperplane, 228 
in computer graphics, 219, 220 
in projective geometry, 219, 227 
line, 213 
plane, 206, 213, 227 
Tangential derivative, 292 
Taylor 
polynomial, 296 
series, 296 
Tensor 
anti-symmetric, 410, 413 
Einstein-, 414 
electromagnetic fields, 413 
energy-, 412 
momentum-, 412 
Ricci-, 411 
Riemann-, 410 
stress-, 412 
symmetric 
anti-, 410, 413 
Tensor product, 252 
Tetrahedron 
general-, 299 
integral over, 301 
mesh of, 327 
regularity, 347 
side (see: side), 291 
unit- 
integral over, 288 
volume of, 359 
Theorem 
fundamental-, 166, 278, 281 


Gersgorin-, 98 
isomorphism-, 166, 197 
Pythagoras, 10, 68, 189 
Theory 
group-, 143 
Time, 106 
axis, 106 
dilation, 113, 117, 134 
gravitational-, 404 
event, 117 
proper-, 112, 134 
relative-, 232 
self-, 134 
space-, 103, 117, 404 


Topology, 193, 196, 200, 201, 225 


Total 

energy, 126, 247 
Trace, 392 

subtracted, 415 
Trajectory, 219 

in relativity, 119 
Transform 

cosine-, 38 

Fourier-, 48 

sine-, 34 
Transformation 

affine- 

in 3-d, 348 

composition, 110, 149, 176 

gauge-, 407 

identity-, 109, 174 

invariant-, 62, 109 

inverse-, 111, 147 

invertible-, 173 

Lorentz-, 107, 110, 127 

Moebius- (see: Moebius), 146 

orthogonal-, 62 

projective-, 218, 305 
Transitive, 159, 168 
Translation, 218 

matrix, 219 
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Transpose (see: matrix, vector, tensor), 17, 


388 
conjugate-, 21 
gradient, 287, 293 
minor, 54 
vector, 293 

Trapezoidal rule, 278 
Triangle 
inequality, 40 
unit- 
integral over, 281 
Triangular 
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lower-, 182 
part, 18 
upper-, 183, 305 
part, 18 
Tridiagonal 
matrix, 34 
Twin paradox, 114 
Two-body problem, 85 
Two-dimensional (see: dimension), 168 
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ncertainty 
principle, 242 
Uniform 
grid, 32, 36 
Union 
of planes, 170 
operation, 169 
Unit 
atomic-, 380 
circle, 189 
closed, 100 
open, 100 
cube, 330 
closed-, 343 
eigenvector, 29 
element, 151 
in factor group, 164 
in homomorphism, 155 
in Moebius group, 167 
interval 
integration, 279 
open-, 32, 36 
matrix, 19 
sphere, 190 
tetrahedron 
integral over, 288 
triangle, 281 
vector, 24, 29, 34, 221, 289 
in 3-d, 56 
standard-, 25, 56, 77, 150, 178, 226, 
236 
Unitary, 394 
Unitary matrix, 28, 48 
Unity 
root of, 42 
Universe, 137, 412 
closed-, 413 
compact-, 413 
Euclidean-, 413 
flat-, 413 
noncompact-, 413 
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open-, 413 
Upper triangular, 18, 183, 305 


Vv 
Vacuum 
Einstein equations, 411 
energy, 414 
Variable 
independent-, 268, 279, 285 
in polynomial, 268, 279, 285 
random-, 240, 382 
covariance, 241 
expectation, 240, 245 
variance, 241 
three-, 285 
two-, 279 
Variance, 241 
Variational, 369 
Vector, 3, 6 
antipodal-, 200, 203 
column-, 12, 26, 300 
complex-, 11 
component of-, 3 
coordinate of-, 3 
cosine of-, 331 
2-d, 3 
gradient, 281 
3-d, 6 
gradient, 286 
dimension of-, 3 
field, 287 
function, 281, 287 
infinite-, 234 
inner product, 22, 24, 57, 290 
real-, 23, 205, 305 
norm, 23 
normal-, 206, 221 
in angle, 349 
in mesh, 336 
normalized-, 24, 29 
orthogonal-, 26, 33, 36, 40, 206, 221 
orthonormal-, 26, 33, 40, 221 
product, 58, 66, 210, 226 
triple-, 83, 226 
projection, 57 
real-, 23 
row-, 293 
set, 168, 201 
space, 3, 8, 143, 265 
transpose of-, 293 
unit- (see: unit vector), 29, 34 
Vecuum, 410 
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Velocity, 70 

adding-, 104 

angular-, 69 

perpendicular-, 120 
Vertex 

in tetrahedron, 299 
Vertical 

coordinate, 4 

line, 282 

plane, 169 
Visualization, 191, 193, 211, 218 


Ww 

Wave, 137 
amplitude, 35, 235 
cosine-, 36 
discrete-, 32, 36 
electromagnetic-, 264 


441 


equation, 407 
exponent-, 43 
frequency, 32 
function, 235, 380 
gravity-, 407 
interference, 235 
number, 32 

phase, 235 

sine-, 31 
standing-, 248, 250 
superposition, 236 


Web, 97 
Weighted graph, 91 
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measure, 365 
volume, 365 


